Proceedings of the 10th Student Computing Research Symposium (SCORES’24) Maribor, Slovenia October 3, 2024 Niko Lukač Iztok Fister Štefan Kohek (Eds.) https://www.scores.si Proceedings of the 10th Student Computing Research Symposium (SCORES’24) Niko Lukač Iztok Fister Štefan Kohek (Eds.) October, 2024 Title Proceedings of the 10th Student Computing Research Symposium (SCORES’24) Editors Niko Lukač (University of Maribor, Faculty of Electrical Engineering and Computer Science) Iztok Fister (University of Maribor, Faculty of Electrical Engineering and Computer Science) Štefan Kohek (University of Maribor, Faculty of Electrical Engineering and Computer Science) Tehnical editors Štefan Kohek (University of Maribor, Faculty of Electrical Engineering and Computer Science) Jan Perša (University of Maribor, University Press) Design University of Ljubljana, Faculty of Computer and Information Science University of Maribor, Faculty of Electrical Engineering and Computer Science University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies Graphic material Sources are own unless otherwise noted. The authors and Lukač, Fister, Kohek (editors), 2024 Conference 10th Student Computing Research Symposium (SCORES’24) Location University of Maribor Faculty of Electrical Engineering and Computer Science Koroška cesta 46, 2000 Maribor, Slovenia Date October 3, 2024 Program Committee Niko Lukač (University of Maribor) Štefan Kohek (University of Maribor) Iztok Fister (University of Maribor) Jan Popič (University of Maribor) Klemen Berkovič (University of Maribor) Slavko Žitnik (University of Ljubljana) Timotej Knez (University of Ljubljana) Domen Šoberl (University of Primorska) Lucija Brezočnik (University of Maribor) Grega Vrbančič (University of Maribor) Jure Zabkar (University of Ljubljana) Matjaž Krnc (University of Primorska) Andrej Brodnik (University of Ljubljana) Mario Gorenjak (University of Maribor) Jani Dugonik (University of Maribor) Peter Rogelj (University of Primorska) Dušan Fister (University of Maribor) Simon Kolmanič (University of Maribor) Uroš Mlakar (University of Maribor) Damjan Strnad (University of Maribor) Marko Bizjak (University of Maribor) Jorge Pérez Aracil (Universidad de Alcalá) Selma Rizvic (University of Sarajevo) Andres Iglesias (University of Cantabria) Magda Gregorova (University of Applied Sciences Würzburg-Schweinfurt) Janez Brest (University of Maribor) Iztok Fister Jr. (University of Maribor) Eneko Osaba (TECNALIA Research & Innovation) Miklós Krész (University of Szeged) Organizing Committee Niko Lukač (University of Maribor) Iztok Fister (University of Maribor) Štefan Kohek (University of Maribor) Jan Popič (University of Maribor) Klemen Berkovič (University of Maribor) Andrej Nerat (University of Maribor) Monika Ferk Ovčjak (University of Maribor) Published by University of Maribor University Press Slomškov trg 15, 2000 Maribor, Slovenia https://press.um.si, zalozba@um.si Co-published by University of Maribor Faculty of Electrical Engineering and Computer Science Koroška cesta 46, 2000 Maribor, Slovenia http://www.feri.um.si, feri@um.si Partners University of Primorska Faculty of Mathematics, Natural Sciences and Information Technologies Glagoljaška 8, 6000 Koper, Slovenia https://www.famnit.upr.si/en, info@famnit.upr.si University of Ljubljana Faculty of Computer and Information Science Večna pot 113, 1000 Ljubljana, Slovenia https://www.fri.uni-lj.si/en, dekanat@fri.uni-lj.si Edition 1st Publication type E-book Published Maribor, Slovenia, October 2024 ISBN 978-961-286-914-4 DOI https://doi.org/10.18690/um.feri.6.2024 Available at https://press.um.si/index.php/ump/catalog/book/886 Organisers and sponsors: Inštitut za računalništvo © University of Maribor, University Press Text © Authors & Lukač, Fister, Kohek (editors), 2024 This book is published under a Creative Commons 4.0 International licence (CC BY 4.0). This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. Any third-party material in this book is published under the book’s Creative Commons licence unless indicated otherwise in the credit line to the material. If you would like to reuse any third-party material not covered by the book’s Creative Commons licence, you will need to obtain permission directly from the copyright holder. https://creativecommons.org/licenses/by/4.0/ Price Free copy For publisher Prof. Dr. Zdravko Kačič, Rector of University of Maribor Attribution Lukač, N., Fister, I., Kohek, Š. (eds.) (2024). Proceedings of the 10th Student Computing Research Symposium (SCORES’24). University of Maribor, University Press. doi: 10.18690/um.feri.6.2024 CIP - Kataložni zapis o publikaciji Univerzitetna knjižnica Maribor 004(0.034.2) STUDENT Computing Research Symposium (2024 ; Maribor) Proceedings of the 10th Student Computing Research Symposium [Elektronski vir] : (SCORES'24) : Maribor, Slovenia, October 3, 2024 / Niko Lukač, Iztok Fister, Štefan Kohek (eds.). - 1st ed. - E-knjiga. - Maribor : University of Maribor, University Press, 2024 Način dostopa (URL): https://press.um.si/index.php/ump/catalog/book/886 ISBN 978-961-286-914-4 (Pdf) doi: 10.18690/um.feri.6.2024 COBISS.SI-ID 212751875 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Editors’ Foreword In the realm of computer science, where inno- underscored the need for fresh perspectives and vation continually reshapes our understanding new approaches. This year’s symposium fea-of technology, the 2024 Student Computing Re- tures a diverse range of research, including ad-search Symposium (SCORES 2024) marks an im- vancements in emotion recognition technolo-portant moment of progress and collaboration. gies, computational problem-solving, and the ap-This year, the Faculty of Electrical Engineering plication of video analysis in healthcare. The and Computer Science at the University of Mari- program also explores new methods in skill mod-bor (UM FERI) leads the organization of SCORES, eling, decision-making processes, and language in partnership with the University of Ljubljana analysis in clinical settings. Additionally, it cov-and the University of Primorska. These institu- ers innovations in device localization techniques tions have came together to provide a platform and developments in object detection within dig-for undergraduate and graduate students, foster- ital environments. ing their contributions to the field. This year, As we review the ideas and research at we are also honored to have the program com- SCORES 2024, we see the beginnings of work mittee extended with renowned international re- that will influence the future of computer sci-searchers. Their expertise has enriched the con- ence. The ideas and innovations shared here are ference, ensuring a high standard of academic not just academic exercises; they represent the rigor and a diverse range of perspectives. next steps in the evolution of technology, driven SCORES 2024 is dedicated to supporting the by the vision and dedication of these talented next generation of computer science postgradu- students. ates, offering them a stage to present their re- Finally, special thanks to the Institute of Com- search, exchange ideas, and engage with the puter Science at UM FERI as the main conference challenges that lie ahead. Recent advancements sponsor, and the UM FERI leadership for the hos-in artificial intelligence and data science have pitality. Editors: Niko Lukač, Iztok Fister, Štefan Kohek v Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 vi Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Conference Program Plenary Speakers 1 1 SCORES’24: History, mission and vision Iztok Fister 2 Evolutionary Computation: Overview, Trends and Perspectives Bogdan Filipič Section 1: Advances in Graph Theory and Algorithmic Solutions Chairman: Štefan Kohek 3 3 Influence of Graph Characteristics on Solutions of Feedback Arc Set Problem Ema Leila Grošelj, Tomaž Poljanšek 7 Learning Multi-Level Skill Hierarchies with Graphwave Simon Bele, Jure Žabkar 11 Integration of Named Entity Extraction Based on Deep Learning for Neo4j Graph Database Lea Roj, Štefan Kohek, Aleksander Pur, Niko Lukač 15 Efficient Implementation of Spreadsheet User Application Tjaša Repič, Aljaž Jeromel, Sašo Piskar, Domen Dolanc, Niko Lukač 19 Improvement and Evaluation of a Heuristic Method for the Minimal Feedback Arc Set Problem Jure Pustoslemšek, Ema Črne, Nejc Rihter Section 2: Image Processing, Computer Vision, and NLP Applications Chairman: Grega Vrbančič 23 23 Counter-Strike Character Object Detection via Dataset Generation Matija Šinko 33 Cross-Lingual False Friend Classification via LLM-based Vector Embedding Analysis Mitko Nikov, Žan Tomaž Šprajc, Žan Bedrač 37 Analyzing Tourist Destinations in Belgrade using Geotagged Photos from Flickr Vera Milosavljević, Dejan Paliska 41 Volleyball Game Analysis Using Computer Vision Algorithms Marko Plankelj, Uroš Mlakar Section 3: Machine Learning and Data Analytics in Various Domains Chairman: Marko Bizjak 45 45 A Bayesian Approach to Modeling GPS Errors for Comparing Forensic Evidence Nika Molan, Ema Leila Grošelj, Klemen Vovk 49 Seven Components of Computational Thinking: Assessing the Quality of Dr. Scratch Metrics Using 230,000 Scratch Projects Gal Bubnič, Tomaž Kosar, Bostjan Bubnic 53 Machine Learning Approaches to Forecasting the Winner of the 2024 NBA Championship Hana Zadravec 57 Hit Song Prediction Through Machine Learning and Spotify Data Andrej Natev vii Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 61 A Data-Driven Approach for the Analysis of Ridership Fluctuations in Transit Systems Jovan Pavlović, Miklós Krész, László Hajdu, András Bóta Section 4: Machine Learning Applications in Neuroscience and Healthcare Chairman: Uroš Mlakar 65 65 Automatic Assessment of Bradykinesia in Parkinson’s Disease Using Tapping Videos Matjaž Zupanič, Dejan Georgiev, Jure Žabkar 69 Exploring Mathematical Decision-Making Through EEG Analysis Riste Micev, Peter Rogelj 73 Analysis of Verbal Fluency in Slovenian Language in Patients With Schizophrenia Mila Marinković, Polona Rus Prelog, Martina Zakšek, Jure Žabkar Index of Authors 77 viii Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Plenary Speakers Iztok Fister University of Maribor, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia SCORES’24: History, mission and vision In this year, the International Student Conference in Computer Science celebrate its first decade. The conference first emerged in 2014 at FERI Maribor and has continue to date. The only interruption, that the conference experienced, was during the Corona crisis in 2020. The keynote focuses on the history, mission, and the future of the Student Conference that started with the name Student Computer Science Research Conference (StuCoSRec) in 2014, and was renamed in 2022 under the initiative of the then organizer FRI Ljubljana to Student Computing Research Symposium (SCORES). Right from the start, the primary mission of the conference was to connect the students of the most important Computer Science Faculties in Slovenia (i.e., FERI MB, FRI LJ, and FAMNIT KP) and to foster them in publishing either the results of their seminar or individual research projects publicly. In line with this, the location of the conference was changed each year according to the current organizer. These conferences are also the place for making new acquaintances among students that could remain active throughout their whole life. In the last three years, the conference experienced a lot of improvements as follow: introducing the keynote speakers and the best paper award, the reviewer process was es-calated, while the conference organization went through a radical automation. Also the Heads of the home Faculties have started to treat it as their own property. At the FERI Faculty, the Institute of Computer Science even put itself in the role of the main sponsor of this conference. When looking into the future, we can observe that the conference is gaining more and more importance in Computer Science, with students being aware of the importance of the flow of knowledge and experience. As a result, this conference, that is free of charge, could bring students the new views on the problems being solved and also open new ways of finding solutions. Therefore, the Steering Committee needs take care of broader internationalization. Finally, I wish the conference a long life and as smooth a path as possible in obtaining much new and high-quality papers. 1 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Bogdan Filipič Jožef Stefan Institute, Ljubljana, Slovenia Evolutionary Computation: Overview, Trends and Perspectives Evolutionary computation is a computational intelligence methodology dealing with theoretical studies, design and applications of search and optimization algorithms, known as evolutionary algorithms. These algorithms mimic biological evolution when iteratively searching for solutions to a given problem. They are well-suited for solving black-box optimization problems where no mathematical formulation is available and problem properties are unknown. In this presentation, we first outline different types of evolutionary algorithms and a unified approach at handling them, as well as their advantages and dis-advantages. We then illustrate their capabilities with examples of successful applications to challenging real-world problems. Next, we overview current trends in evolutionary computation, including the efforts of the community in moving beyond metaphor-based algorithms, recent approaches to problem characterization aimed at better problem understanding, and machine learning of algorithm performance prediction. We conclude with future perspectives, highlighting the need for further research on understandability and explainability in evolutionary computation, and potential utilization of generative artificial intelligence techniques. 2 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Influence of Graph Characteristics on Solutions of Feedback Arc Set Problem Ema Leila Grošelj Tomaž Poljanšek eg61487@student.uni-lj.si tp51101@student.uni-lj.si University of Ljubljana, University of Ljubljana, Faculty of Computer Faculty of Computer and Information Science, and Information Science, Ljubljana, Slovenia Ljubljana, Slovenia ABSTRACT one arc can break multiple cycles simultaneously, requiring fewer In this article we present Feedback Arc Set problem and how certain arcs to be removed than there are cycles. Therefore, the number of graph characteristics impact the results of heuristic algorithms. We cycles is an upper bound on the size of MFAS. then inspect how the most promising characteristic (treewidth) However, this bound can be very loose, as shown in [3], where helps in choosing the most appropriate heuristics for our graph. a graph with 𝑛 nodes and 𝑚 arcs can have up to 1.433𝑚 cycles, and the number of arcs can be quadratic in the number of nodes, KEYWORDS i.e., 𝑚 = 𝑂 𝑛2. In addition to being loose, counting the number of graph, FAS, heuristics, treewidth, random forest classifier cycles is computationally demanding. For example, the algorithm in Python library Networkx [7] requires 𝑂((𝑛 +𝑚)(𝑐 + 1)) time steps, 1 INTRODUCTION where 𝑐 is the number of cycles. This means number of cycles is potentially exponential to number of vertices so we can only count In this article, we tackled the feedback arc set problem, where the all cycles for small graphs in a reasonable time. goal is to find the smallest set of directed edges (arcs) in a directed Another upper bound adequate also for large graphs is repre-graph such that, when removed, the graph becomes acyclic (directed sented by the best result of the heuristics. This is much better since acyclic graph - DAG). We were interested in determining which it is computed much faster. Thus in this article we take the highest characteristics of a graph suggest that a particular heuristic method heuristic result as an upper bound. might perform poorly, providing a solution not close to optimum. Due to the trivial nature, we were not interested in the impact of 2.2 Lower Bound graph size and aimed to normalize this effect. We collected graphs from two existing graph datasets. From these, we built a database For the lower bound, we can use disjoint cycles in the graph. They of their strongly connected components. provide a lower bound because we need to break all these (dis-The database of components was enriched with characteristics: joint) cycles and due to disjointness, we cannot break two cycles number of nodes and arcs, graph density, radius and diameter, by removing one arc. information on whether the graph is planar or bipartite, node con-Note that not all (exhaustive) sets of disjoint cycles in a graph nectivity, transitivity and treewidth. We also added information are equally strong. For instance, consider a graph with arcs about distribution of some characteristics computed on individual 𝐴 = {(1, 2), (2, 3), (3, 2), (3, 1), (1, 3)}. This graph has three cycles nodes (e.g., degree, different types of centrality). and two sets of disjoint cycles: {(1, 2, 3)} and {(1, 3), (2, 3)}. The idea is that by removing the cycle (1, 2, 3) from the graph, we also 2 FEEDBACK ARC SET break all other cycles (exhausting the disjoint cycles). First, let us state that from now on, when we say ’graph’, we mean The set of disjoint cycles we get will depend on the way we search a directed and strongly connected graph with for one cycle within each iteration. If we introduce randomness 𝑛 nodes and 𝑚 arcs. in selecting the starting node and the order of nodes during the Definition 2.1. Feedback arc set (FAS) of a graph 𝐺 = (𝑉, 𝐴) is search, we obtain a random algorithm. When running the algorithm ′ ′ ′ 𝐴 ⊆ 𝐴 such that 𝐺 = (𝑉 , 𝐴 \ 𝐴 ) is DAG. MFAS (minimum FAS) multiple times, we only consider the largest set as we aim for a is the smallest possible FAS. tighter lower bound. In this article we ran search for disjoint cycles In this article, we aim to approximate MFAS size using heuristics, 10 times. as FAS problem is one of Karp’s 21 famous NP-complete problems [10]. Unfortunately, it also does not have an approximation scheme. 3 DATA If a graph is already acyclic, its FAS is empty. 3.1 Data Sources We collected graphs from two sources. 2.1 Upper Bound In [8] they used different generation methods to build a col-Every arc in MFAS lies in at least one cycle. If an arc does not lie in lection of large weighted multi-digraphs that included different a cycle, it does not need to be removed from the graph. This would topologies. Graphs there were nicely divided in groups: De Bruijn contradict the minimality of MFAS. Thus, to break all cycles, it is graphs, Delaunay 3D graphs, Kautz graphs, Triangulation graphs, sufficient to remove one arc from each cycle. However, the FAS Small world graphs and Random graphs. They derived them from composed of these arcs is not necessarily the MFAS, as removing ISPD98 Circuit Benchmark Suite [2]. DOI: https://doi.org/10.18690/um.feri.6.2024.1 ISBN 978-961-286-914-4 3 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Ema Leila Grošelj and Tomaž Poljanšek A collection of graphs presented in [5] was intended for the problem of optimum cycle mean and ratio. It consists of graphs from ISPD98 Circuit Benchmark Suite [2] and random graphs that they generated themselves. In these article we refer to them as unclassified. We read all these graphs, broke them into strongly connected components, as breaking them into such components is usually the first step in heuristics. We wanted to ensure that the sizes of the components (e.g., many small ones and one large one) did not obscure the impact of other interesting characteristics. We then stored the components in 𝑝𝑖𝑐𝑘𝑙𝑒 format for faster re-reading. If a graph contained loops (self-directed arcs), we removed them Figure 1: Increase in the number of nodes (left) and arcs beforehand and added them to the result at the end, as not all (right) in the line graph. heuristics supported graphs with loops. Sixteen graphs with either more than 5000 nodes or more than 10000 arcs were classified as 4.1 SmartAE ’large’ graphs and omitted from the study. Thus, the main database contained 11925 graphs. We implemented the SmartAE algorithm from article [4]. First, we order the nodes, for example by indegree. Then remove outgoing 3.2 Graph Characteristics arcs pointing to unvisited nodes in that order until the graph becomes acyclic. After obtaining an acyclic graph, we attempt to For every graph in our database we saved the number of nodes, re-add removed arcs one by one if they do not cause cycles. number of arcs, graph density, planarity, bipartiteness, diameter, radius, node connectivity (minimum number of nodes that need 4.2 DiVerSeS - Using FVS Heuristics to be removed to disconnect the graph), transitivity (probability that the ends of two arcs that share a common node are themselves The size of the FAS equals the size of the FVS (feedback vertex connected), and treewidth (see Subsection 3.2.1). set, dual problem) on the line graph. The line graph is obtained by For nodes, we calculated degree, closeness centrality (inverse mapping arcs to nodes. In the line graph, two nodes representing average shortest path length from the node to all other nodes), arcs from original graph (𝑢, 𝑣) and (𝑤, 𝑥) are connected with arc betweenness centrality (frequency with which a node is part of the (𝑢𝑣, 𝑤𝑥) if 𝑣 = 𝑤, meaning each arc in the line graph represents a shortest paths between other nodes), degree centrality (the fraction directed path of length two in the original graph. This transforma-of nodes it is connected to), clustering coefficient (how many trian-tion is described in [12]. Line graphs are typically larger, making gles a node is part of out of the possible triangles), node centrality the problem harder. Increase in size on subsample of grafs from our (node influence within the network, considering both the number database is depicted in Figure 1. of neighbors and neighbors of neighbors), and PageRank (a rank After transformation, we ran the winning FVS solver DiVerSeS obtained by the PageRank algorithm measuring the importance of from the PACE 2022 challenge [1], which dealt with FVS on directed a node). For each graph, we recorded the min, max, median, and graphs. We gave it 5 seconds to return a solution. If no solution interquartile range of these values. was found, we ran it for 10 and then 40 seconds. Even then some graphs remained unsolved. 3.2.1 Treewidth. Treewidth represents how close a graph is to being a tree, with trees having a treewidth of 1. It represents the 4.3 Graph hierarchy based approaches minimum width of the largest component across all tree decom-We ran algorithms from [11]. Goal is to break cycles while still pre-positions of the graph. The treewidth of an undirected graph was serving logical structure (hierarchy) as much as possible. Hierarchy extended to directed graphs in [9]. information identifies which edges need to be removed. Heuristics According to [6], for a directed graph 𝐺, it holds that differ in a way they determine hierarchy based on different features. We decided to test 3 approaches: greedy (FAS Greedy), PageRank 𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑_𝑡𝑟𝑒𝑒𝑤𝑖𝑑𝑡ℎ(𝐺) ≤ 𝑡𝑟𝑒𝑒𝑤𝑖𝑑𝑡ℎ(𝑢𝑛𝑑𝑖𝑟𝑒𝑐𝑡𝑒𝑑 (𝐺)). rating (PageRank SCC) and Bayesian skill rating system (TrueSkill Equality is achieved when all arcs are bidirectional. Two heuristics, SCC), giving us another 3 heuristics. FAS Greedy was ran 5 times as it uses randomness and is also by far the fastest. 𝑡𝑟𝑒𝑒𝑤𝑖𝑑𝑡ℎ_𝑚𝑖𝑛_𝑑𝑒𝑔𝑟𝑒𝑒 and 𝑡𝑟𝑒𝑒𝑤𝑖𝑑𝑡ℎ_𝑚𝑖𝑛_𝑓 𝑖𝑙𝑙_𝑖𝑛 implemented in NetworkX library [7], provide an upper bound for the treewidth, which also serves as an upper bound for ’directed treewidth’. The 5 METHODOLOGY minimum degree heuristic method repeatedly selects and removes We ran heuristics on the dataset and saved size of FAS and running the node with the lowest degree, while the minimum fill-in heuristic time. 1 For each graph we determined heuristic method that found method selects the node, whose removal minimizes the number of tha smallest FAS. On a tie, heuristic method with lower running added arcs needed to make its neighborhood a clique. time won. For determining which graph features are most important in 4 HEURISTICS determining which heuristic method is the best to use, we used We have tested five different heuristics. 1Implementations and dataset can be found at https://github.com/elgroselj/FAS. 4 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Influence of Graph Characteristics on Solutions of Feedback Arc Set Problem Figure 2: Best solvers. Figure 4: Gap between lower and upper bound. Figure 3: Best solvers by categories. Figure 5: Histogram of the ratios between upper and lower bound. random forest classifier: it is stable and not to difficult to explain. We trained and evaluated model using 5-fold cross-validation. Then we evaluated feature importances. Firstly we used model’s is good to know which heuristics works best as it makes a lot of attribute 𝑓 𝑒𝑎𝑡𝑢𝑟𝑒_𝑖𝑚𝑝𝑜𝑟𝑡𝑎𝑛𝑐𝑒𝑠, that represents accumulation of difference. the impurity decrease within each tree. We also tested importance of features using permutation test - that is we randomly permuted 6.2 Classification and features values in one column at a time and observed performance degrada-tion. In classification with random forest classifier we achieved the accuracy of 0.932. This is significant improvement over the majority 6 RESULTS classifier (predicts DiVerSeS as the winner for all inputs) with an accuracy of 0.707. Feature importance provided by model’s fea-6.1 Heuristics ture_importance attribute is shown in Figure 6, while permuta-As shown in Figure 2, the DiVerSeS solver was the best on majority tion_importance is shown in Figure 7. Features with very little of the graphs. However, for some graph groups other heuristics gave importance are left out. better results as shown in Figure 3. Turns out that on 𝑢𝑛𝑐𝑙𝑎𝑠𝑠𝑖𝑓 𝑖𝑒𝑑 We can see that treewidth is the most important characteristic graphs method FAS Greedy reported the best result. On Kautz according to both figures. This is not very surprising since with graphs we recommend to use the method PageRank SCC, while edges removal we create acyclic graph or a tree. Characteristics DiVerSeS method dominated on all other graph groups. pagerank_max, number of arcs 𝑚 and Katz centrality_min also If we closely examine differences between lower and upper have a significant importance. It is also notable that while number bounds in Figure 4 we see that in most cases solutions are well of nodes 𝑛 has accumulated a lot of impurity decrease according constrained - that is lower and upper bound are relatively close, to model’s feature importance, it lacks at being innovative in the giving us a narrow interval of possible FAS sizes. We prooved opti-sense that permuting it randomly does not affect the success much, mality in 26.9% of examples. In Figure 5 we see that in most cases which suggests that 𝑛 does not provide new information. ratios between worst and best solutions is lower than two. We have Figure 8 shows us that DiVerSeS generally does the best for clipped the graph in Figure 5 to show the great majority of ratios, graphs with treewidth at least 10 (this is also true for graphs with however there are some individual cases where ratio is quite high treewidth ≥ 100). For less than that FAS Greedy heuristic method (most extreme example has ratio of 17.3). For these examples it gives the best results. 5 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Ema Leila Grošelj and Tomaž Poljanšek Figure 6: Feature importance. Figure 7: Permutation importance. [3] Andrii Arman and Sergei Tsaturian. 2017. The maximum number of cycles in a graph with fixed number of edges. arXiv preprint arXiv:1702.02662 (2017). [4] C. Cavallaro, V. Cutello, and M. Pavone. 2023. Effective heuristics for finding small minimal feedback arc set even for large graphs. In CEUR Workshop Proceedings, Vol. 3606. https://ceur-ws.org/Vol-3606/paper56.pdf [5] Ali Dasdan. 2004. Experimental analysis of the fastest optimum cycle ratio and mean algorithms. ACM Transactions on Design Automation of Electronic Systems 9, 4 (2004), 385–418. https://doi.org/10.1145/1027084.1027085 [6] Frank Gurski, Dominique Komander, and Carolin Rehs. 2021. How to compute digraph width measures on directed co-graphs. Theoretical Computer Science 855 (2021), 161–185. [7] Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. 2008. Exploring network structure, dynamics, and function using NetworkX. In Proceedings of the 7th Python in Science Conference (SciPy2008), Gäel Varoquaux, Travis Vaught, and Jarrod Millman (Eds.). Pasadena, CA USA, 11–15. [8] Michael Hecht, Krzysztof Gonciarz, and Szabolcs Horvát. 2021. Tight localiza-tions of feedback sets. Journal of Experimental Algorithmics (JEA) 26 (2021), 1–19. [9] Thor Johnson, Neil Robertson, Paul D Seymour, and Robin Thomas. 2001. Directed tree-width. Journal of Combinatorial Theory, Series B 82, 1 (2001), 138–154. [10] Richard M Karp. 2010. Reducibility among combinatorial problems. Springer. [11] Jiankai Sun, Deepak Ajwani, Patrick K. Nicholson, Alessandra Sala, Alessandra, and Srinivasan Parthasarathy. 2017. Breaking cycles in noisy hierarchies. In Proceedings of the 2017 ACM on Web Science Conference. 151–160. [12] Jin-Hua Zhao and Hai-Jun Zhou. 2016. Optimal disruption of complex networks. arXiv preprint arXiv:1605.09257 (2016). Figure 8: Histograms of treewidths by best solvers. 7 CONCLUSIONS Treewidth is the most important graph characteristic in determining the best heuristic for graph. Number of arcs and Katz centrality also have significant impact. For graphs with higher treewidth we recommend using DiVerSeS and for lower treewidth FAS Greedy heuristic. ACKNOWLEDGEMENTS We sincerely thank assist. prof. dr. Uroš Čibej for his advice, guidance and for introducing us to this topic. REFERENCES [1] 2022. PACE2022. https://pacechallenge.org/2022/tracks/. [Accessed 26-05-2024]. [2] Charles J Alpert. 1998. The ISPD98 circuit benchmark suite. In Proceedings of the 1998 international symposium on Physical design. 80–85. 6 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Learning Multi-Level Skill Hierarchies with Graphwave Simon Bele Jure Žabkar sb95099@student.uni-lj.si jure.zabkar@fri.uni-lj.si University of Ljubljana, University of Ljubljana, Faculty of Computer and Faculty of Computer and Information Science, Information Science, Ljubljana, Slovenia Ljubljana, Slovenia ABSTRACT and so enables us to learn the options framework on the obtained We introduce a novel framework for learning multi-level skill hi-clustering. erarchies in reinforcement learning environments by leveraging We evaluate our method by comparing it to the approach of structural similarities in state-space graphs. To obtain structural Evans et al. [3]. We integrate our code into their framework and embeddings, we use the Graphwave algorithm, which places struc-observe the learning efficiency on four domains. We show that in turally similar states in close proximity in the latent space. In the three out of four, our method performs significantly better while latent space, we perform hierarchical clustering of states while in the Four Rooms domain that features extreme modularity, the respecting the topology of the state-space graph. At different levels approach of Evans et al. outperforms ours. of the hierarchy we learn the options that represent the skills; a skill at each level of the hierarchy is defined using the skills from 2 RELATED WORK the level below. We compare our approach with the state-of-the-A common approach in reinforcement learning involves modeling art method across several environments. Our results show that the underlying Markov Decision Process (MDP), wherein a policy structural embeddings can speed up option learning significantly 𝜋 : 𝑆 ×𝐴 → [0, 1] is learned to maximize a reward function. Specifi-in certain domains. cally, the action-value function 𝑄𝜋 (𝑠,𝑎) for a policy 𝜋 encapsulates the expected reward for states in the environment. The action-value KEYWORDS function adheres to the Bellman equations, and the task can thus be skill hierarchy, reinforcement learning, options, structural similar-rephrased as optimizing these equations to find the optimal policy. ity, graph embeddings The options framework in reinforcement learning is a well-established method for reasoning across multiple levels of temporal abstraction, effectively implementing Semi-Markov Decision Processes [4, 8]. 1 INTRODUCTION An option is defined by a 3-tuple 𝜔 = (𝐼𝜔, 𝜋𝜔, 𝛽𝜔), where 𝐼𝜔 ⊆ 𝑆 In Reinforcement Learning (RL), an agent learns to make decisions represents the subset of the state space in which the option is exe-by interacting with an environment; it operates on the principles of cutable, 𝜋𝜔 : 𝑆 × 𝐴 → [0, 1] is a policy determining the probability trial and error and obtains positive or negative feedback (rewards) of taking action 𝑎 in state 𝑠, and 𝛽𝜔 : 𝑆 → [0, 1] specifies the termi-from the environment. The overall goal of the agent is to maximize nation condition, indicating the probability of option termination the cumulative rewards. Traditional RL approaches can struggle in a given state. with scalability and efficiency as the complexity of the environment To hierarchically cluster the state space, one can derive higher-increases or the task become increasingly difficult. level options over lower-level options, where the initiation set of A possible way to tackle this challenge is to introduce skill hi-the higher-level option is the union of initiation sets of lower-level erarchies in RL [1, 6]. Skill hierarchies enable the decomposition options, thereby enabling options at multiple time scales. of complex tasks into simpler sub-tasks, usually improving the Training options across multiple time scales necessitates gen-generalization of learned behaviors across different scenarios. This eralizing the usual Bellman equations to be defined over options usually leads to a more efficient learning process but also produces rather than actions, termed intra-option learning [7]. more robust and interpretable actions. The MDP induces a state transition graph, with nodes represent-Traditional approaches in developing these hierarchies have pri-ing states and edges denoting possible actions between states. marily focused on single-level structures, where skills are often Several approaches leverage the state transition graph of the defined through predefined policies or through the clustering of underlying MDP to learn skills. A notable advancement by Xu et al. state transitions without considering the deeper structural relation- [9] employs the Louvain graph clustering algorithm to partition the ships between these transitions. Recently, Evans et al. [3] introduced state transition graph into clusters, subsequently defining options a method for learning skill hierarchies based on Louvain clustering as traversals across the aggregate graph of these clusters. of the state-space graph, which optimizes its modularity. Previous efforts to create single-level skill hierarchies have pri-In this paper, we introduce an approach that goes beyond modu-marily utilized different measures of centrality or various graph larity: we use the Graphwave algorithm that identifies structural partitioning algorithms. similarities within a graph. We cluster structural embeddings in Evans et al. [3] introduce a multi-level skill hierarchy trained on latent space, thus providing a more robust foundation for skill learn-the entire hierarchical clustering of Louvain. Due to the intractable ing. Our approach also preserves the topology of the state graph problem of modularity maximization, the greedy-natured Louvain DOI: https://doi.org/10.18690/um.feri.6.2024.2 ISBN 978-961-286-914-4 7 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Simon Bele and Jure Žabkar algorithm optimizes for moving nodes between partitions at each 3.3 Option learning step if and only if this move results in a positive modularity gain. For the sake of comparisons with Evans et al. [3], we similarly This can be seen as a local approach to modularity optimization. construct the skill hierarchy as follows. They employ macro-Q learning [5] and intra-option learning [7] to Let ℎ represent the number of partitions produced by our algo-train hierarchical agents. rithm when applied to the state transition graph. Each of these ℎ The above approach is novel in producing the first multi-level partitions defines a skill layer, forming an action hierarchy with skill hierarchy, where it is produced automatically with no human ℎ levels of abstract actions above primitive actions. Each hierar-intervention. Through it they obtain options reflecting optimizing chy level consists of skills designed to efficiently navigate between for modularity, which they show to be useful for navigating at the clusters of the state transition graph. top-most level of the skill hierarchy. We define options for moving from cluster 𝑐𝑖 to cluster 𝑐𝑗 is defined by: initiation states in 𝑐 3 METHODOLOGY 𝑖 , a policy to navigate from any state in 𝑐𝑖 to a state in 𝑐𝑗, and termination upon reaching 𝑐𝑗. 3.1 Structural similarity embeddings Leveraging the hierarchical structure of the partitions, we define To obtain an embedding of nodes that places structurally similar skills at each level of the hierarchy using the skills from the pre-nodes in close proximity within the latent space, we employ Graph-ceding level. At each hierarchy level, the policies for higher-level wave [2], a methodology that provides strict guarantees regarding actions call actions (either options or primitive actions) from the the separation of structurally equivalent nodes. level below, with primitive actions only invoked directly at the base Consider an undirected graph level. 𝐺 = (𝑉 , 𝐸) with its graph Lapla- cian defined as 𝐿 = 𝐷 − 𝐴, where 𝐷 is the degree matrix and 𝐴 is the adjacency matrix. Let the eigenvector decomposition of 𝐿 be 4 EVALUATION given by 4.1 Domains 𝐿 = 𝑈 Λ𝑈𝑇 , (1) The skill hierarchy is evaluated in four environments (Figure 1), the where 𝑈 is the matrix of eigenvectors and Λ is the diagonal matrix first three of which are different examples of the rooms environment. of eigenvalues. An empty room, two rooms connected by a bottleneck and four By applying a heat diffusion wavelet 𝑔 rooms as in [3]. The agent is given a starting position and a goal 𝑠 (𝜆) = 𝑒 −𝜆𝑠 , define the spectral graph wavelet centered at node 𝑎 as position and attempts to navigate between them as effectively as possible. The last domain we look at is the Towers of Hanoi, a Ψ𝑎 = 𝑈 Diag(𝑔𝑠 (𝜆1), . . . , 𝑔𝑠 (𝜆𝑁 ))𝑈𝑇 𝛿𝑎, (2) classic mathematical puzzle. It involves moving a set of disks from where one peg to another, following specific rules. 𝛿𝑎 is the Dirac delta function at node 𝑎. To circumvent computational intractability [2], the wavelet is Each of the rooms environments feature four basic movements: treated as a probability distribution over the graph: north, south, east, and west. These movements steer the agent in the chosen direction unless obstructed by a wall, in which case the 1 𝑁 ∑︁ agent stays in place. Each action incurs a penalty of -0.001, with 𝜙𝑎 (𝑡) = 𝑒𝑖𝑡Ψ𝑚𝑎 , (3) a bonus of +1.0 awarded upon reaching the goal state. Each run 𝑁 𝑚=1 begins from a designated start state and aims for a goal state. for time point 𝑡. The empirical characteristic function is then sam-The Towers of Hanoi involves four disks of varying sizes posi-pled and transformed into a vector embedding: tioned on three pegs. An episode commences with all disks stacked on the leftmost peg. Actions involve moving the top disk from one 𝜒𝑎 = [Re(𝜙𝑎 (𝑡𝑖)), Im(𝜙𝑎 (𝑡𝑖))]𝑡 , (4) 1,...,𝑡 peg to another, ensuring no larger disk is placed on a smaller one. 𝑑 with Each action incurs a -0.001 penalty, with an additional +1.0 reward 𝑑 being the number of samples. This resulting 2 granted upon achieving the goal state, which is when all disks are 𝑑-dimensional embedding ensures that struc- turally equivalent nodes in the graph will be at most a predefined stacked on the rightmost peg. 𝜖 distance apart in the 𝓁2 norm, thereby providing rigorous guarantees on the proximity of such nodes [2]. 4.2 Structural Skill Hierarchy The hierarchy obtained through the above clustering method will 3.2 Clustering cluster structurally similar nodes together. To hierarchically cluster nodes based on their embeddings, our To showcase an example, we show the hierarchical clustering approach utilizes an agglomerative clustering algorithm. of Two Rooms (Figure 2), at the lowest level the walls of the room This algorithm iteratively merges the nearest clusters based as well as the bottleneck state are clustered together. The corners on the average linkage criterion. To maintain the integrity of the of the room are given their own individual clusters and then the graph’s topology, clusters are only compared if there exists a direct center of the room is partitioned symmetrically with respect to the path between them that bypasses other clusters. The height of the bottleneck state. The second level then merges most of the interiors hierarchy was chosen to match the height of Louvain for the sake of the individual rooms while still giving the corner states their of fair comparison between the two approaches [3], but could also individual clusters. The final level then merges the corner states be defined through any dendrogram cutting strategy. into the walls of the room and gives three clusters which are the 8 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Learning Multi-Level Skill Hierarchies with Graphwave (a) Empty room (b) Two rooms (c) Four rooms (d) Towers of Hanoi Figure 1: The environments used in the experiments. (a) Empty Room: A simple environment with no obstacles. (b) Two Rooms: An environment divided into two connected rooms. (c) Four Rooms: A more complex environment divided into four connected rooms. (d) Towers of Hanoi: A classic puzzle environment where the goal is to move disks between pegs according to specific rules. Figure 2: Hierarchical clustering of the Two Rooms environment at various levels. The lowest level (left-most image) clusters walls and bottleneck states, the second level (middle image) merges room interiors while keeping corners separate, and the final level (right-most image) combines corners into walls, resulting in three main clusters. two interiors of the rooms and the final cluster essentially contains is optimally done through a clustering that relies on modularity. these two rooms. We outperform the Louvain skill hierarchy in the Towers of Hanoi, converging faster and having it catch up. 4.3 Results To compare with Evans et al. [3], in the analysis, we created options 5 CONCLUSIONS by building the full state transition graph and then learned their In this paper, we introduced a novel approach to hierarchical skill policies offline using macro-Q learning [5]. learning by leveraging Graphwave to obtain structural embeddings We trained all hierarchical agents with macro-Q learning and of the states. By clustering the states and preserving the topology of intra-option learning [7]. Shaded areas in the learning curves show the state-space graph, we enabled efficient option learning, where the standard error, based on 40 independent trials. options represent skills at various levels of abstraction. In our exper-The parameters set were the same as in [3], a learning rate of iments, we compared the proposed approach to the state-of-the-art 𝛼 = 0.4, a discount factor of 𝛾 = 1, and initial action values of method by Evans et al. [3] and showed that our method can speed 𝑄0 = 0. An 𝜖-greedy strategy with 𝜖 = 0.1 was used for exploration. up the learning process significantly in some cases. The shown learning curves represent evaluation performance. Post However, in some domains, optimizing for modularity obviously each training epoch, the policy was assessed (with exploration and yields better skill hierarchies and faster option learning. It remains learning disabled) in a separate environment instance. an open question for future research to determine which properties We observe (Figure 3) that on the domain of Empty Room we of a domain’s state-space graph are more suited for each method. quickly converge to a rewarding strategy before eventually seeing This question is related to another open challenge, namely the the Louvain skill hierarchy catch up. In the Two Room environment characterizations of a useful skill: for a given complex task, what we observe much faster convergence as well as Louvain starting defines a proper skill hierarchy. to catch up relatively slowly. The Four Rooms domain favours the Future work could also explore incrementally building the state-Louvain skill hierarchy clearly, one might deduce this due to it space graph and deriving the optimal skill hierarchy for the partially being more important to traverse between rooms quickly which observed graph. This approach may influence how confidently the 9 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Simon Bele and Jure Žabkar Figure 3: The following figure illustrates the performance of hierarchical agents using Louvain and Graphwave skill hierarchies in different environments: Empty Room, Two Rooms, Four Rooms, and Towers of Hanoi. We observe that in the Empty Room environment, both skill hierarchies converge quickly to a rewarding strategy, with Graphwave performing better initially but Louvain catching up over time. In the Two Rooms environment, Graphwave converges significantly faster than Louvain. The Four Rooms domain favors the Louvain skill hierarchy, likely due to the importance of quickly traversing between rooms using a clustering that relies on modularity. In the Towers of Hanoi, Graphwave outperforms Louvain, showing faster convergence and maintaining an advantage throughout. The provided plots show the reward progression over epochs for each environment, highlighting the differences in performance and convergence rates between the Louvain and Graphwave skill hierarchies. partitioning is constructed over time, as new information becomes [4] Marlos Machado, Andre Barreto, and Doina Precup. 2021. Temporal Abstraction available and the graph evolves. One can develop algorithms that in Reinforcement Learning with the Successor Representation. dynamically adjust the skill hierarchy based on the current state [5] Amy Mcgovern, Richard Sutton, and Andrew Fagg. 1999. Roles of Macro-Actions in Accelerating Reinforcement Learning. (Feb. 1999). of the graph, ensuring that the hierarchy remains optimal as the [6] Matthew Riemer, Miao Liu, and Gerald Tesauro. 2018. Learning Abstract Options. environment changes. One may also pursue a similar direction in CoRR abs/1810.11583 (2018). arXiv:1810.11583 http://arxiv.org/abs/1810.11583 [7] Richard S Sutton, Doina Precup, and Satinder Singh. 1998. Intra-Option Learning constructing skill hierarchies in problems that involve continuous about Temporally Abstract Actions.. In ICML, Vol. 98. 556–564. state-spaces. [8] Richard S. Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. REFERENCES Artificial Intelligence 112, 1 (1999), 181–211. https://doi.org/10.1016/S0004-3702(99)00052-1 [1] Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The Option-Critic Archi- [9] Xiao Xu, Mei Yang, and Ge Li. 2018. Constructing Temporally Extended Ac-tecture. tions through Incremental Community Detection. Proceedings of the AAAI Conference on Artificial Intelligence 31, 1 (Feb. Computational Intelligence and 2017). https://doi.org/10.1609/aaai.v31i1.10916 Neuroscience 2018, 1 (2018), 2085721. https://doi.org/10.1155/2018/2085721 [2] Claire Donnat, Marinka Zitnik, David Hallac, and Jure Leskovec. 2018. Learning Structural Node Embeddings Via Diffusion Wavelets. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1320–1329. https://doi.org/10.1145/3219819.3220025 arXiv:1710.10321 [cs, stat] [3] Joshua B. Evans and Özgür Şimşek. 2024. Creating Multi-Level Skill Hierarchies in Reinforcement Learning. arXiv:2306.09980 [cs] 10 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Integration of Named Entity Extraction Based on Deep Learning for Neo4j Graph Database Lea Roj Štefan Kohek l.roj@um.si stefan.kohek@um.si University of Maribor, University of Maribor, Faculty of Electrical Engineering and Faculty of Electrical Engineering and Computer Science, Computer Science, Maribor, Slovenia Maribor, Slovenia Aleksander Pur Niko Lukač pur.aleksander@gmail.com niko.lukac@um.si Ministry of the Interior, University of Maribor, Ljubljana, Slovenia Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia Abstract additional information. Neo4j uses Cypher, a declarative query The increase in unstructured textual data has created a pressing language specifically designed for querying graph databases [3]. demand for effective information extraction techniques. This paper Integrating Named Entity Recognition (NER) based on deep explores the integration of Named Entity Extraction (NEE) using learning within Neo4j graph databases has been an area of active deep learning within the Neo4j graph database. Utilizing the Rebel research and development. Ni et al. [4] addresses the challenge of Large Model, we converted raw text into structured knowledge translating natural language queries into graph database queries graphs. The primary objective is to evaluate the efficacy of this for intelligent medical consultation systems. The authors developed integration by examining performance metrics, such as process-a Text-to-GraphQL model that utilizes a language model with a ing time, graph growth, and entity representation. The findings pre-trained Adapter, enhancing the semantic parsing capabilities highlight how the structure and complexity of graphs vary with by linking GraphQL schemas with corresponding natural language different text lengths, offering insights into the potential of combin-utterances. ing deep learning-based NEE with graph databases for improved Fan et al. [5] wrote about geological hazards, a deep learning-based data analysis and decision-making. NER model that was used to construct a knowledge graph from literature. This model addresses challenges such as diverse entity Keywords forms, semantic ambiguity, and contextual uncertainty. The resulting knowledge graph, stored in Neo4j, enhances the usability of Named entity extraction, deep learning, Neo4j, graph database, geological research data. knowledge graphs Chaudhary et al. [6] propose a system that converts raw text into a knowledge graph using Neo4j, addressing inefficiencies in traditional tools like Spacy, NLTK, and Flair. Their method combines 1 Introduction entity linkage and RE to convert unstructured data into a knowledge The rise of digital news and social media has significantly increased graph, leveraging graph-based NER and Linking for a contextual the importance of NEE. As more information is generated online, understanding of data. The implementation utilizes the REBEL extracting this information became critical for various applications, [7] model for RE. In comparison with our approach, they use the such as search engines and recommendation systems [1]. NEE, BLINK [8] model for entity disambiguation and linking. Meanwhile along with Relation Extraction (RE), is essential for transforming we focus on efficient entity normalization by querying Wikipedia. unstructured text into structured data, enabling more effective data Furthermore, while Chaudhary et al.’s system emphasizes improve-analysis and decision-making. Building on the findings of a previous ments for processing large untagged datasets using graph-based work [2] that evaluated various hyper-parameters and analyzed NER and linking, we achieve comparable results using traditional sensitivity performance, this paper takes a step further by exploring NER through Spacy while focusing on entity filtration and knowl-the integration of NEE and RE within the Neo4j graph database. edge enrichment. We also performed several graph analytics in Neo4j is a graph database that provides a powerful way to store Neo4j, providing deeper insights and analysis. and query complex relationships between entities. This makes it The objective of this paper is to demonstrate the implementa-well-suited for applications involving interconnected data, such as tion process of integrating NEE into the Neo4j graph database. It social networks, recommendation systems, and fraud detection. In aims to evaluate the effectiveness of this integration and analyze Neo4j, data is stored as nodes and relationships. Nodes represent various performance metrics. Specifically, the paper will measure entities, while relationships represent the connections between the processing time required to extract named entities from text these entities. Both can have properties (key-value pairs) to store and represent them in Neo4j, analyze graph growth in relation to DOI: https://doi.org/10.18690/um.feri.6.2024.3 ISBN 978-961-286-914-4 11 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Lea Roj, Štefan Kohek, Aleksander Pur, and Niko Lukač text length, evaluate average total neighbors score based on text similar entities to prevent clutter, merging or discarding them as length, and analyze how many entities are actually shown in graph needed. Cosine similarity measures are used to assess and reinforce and how many are filtered out. thematic links between entities, enhancing the overall coherence The next section details the workflow from text pre-processing of the knowledge base. to graph visualization in Neo4j. The Results showcases the findings, including charts that visualize the performance metrics. Finally, the 2.1 Knowledge graph within Neo4j Conclusion summarizes the benefits and purpose of the integration, A knowledge graph is generated from the extracted and filtered highlights key findings from the study, and suggests potential areas entities, and relations. This structured representation helps in visu-for future research and development. alizing the connections and relationships within the text. Finally, the knowledge graph is stored and visualized in Neo4j. 2 Methodology In the integration process, the data obtained using NER is saved The workflow from text pre-processing till graph construction in to a graph database through the Neo4j driver. Afterwards, the Neo4j is represented in figure 1. The entire process consists of mul-method iterates over entities in the knowledge base to determine tiple crucial steps including text pre-processing, NEE, RE, entity category for each entity. Using the ’MERGE’ Cypher command the normalization and filtration, and finally generation and visualiza-method either finds an existing node (based on the name) or creates tion of the knowledge graph in Neo4j. The details of these steps a new node if none exists. Attributes such as ’url,’ ’summary,’ and have been in depth discussed in our previous paper [2]. ’category’ are then added to each node. Text pre-processing NEE MERGE (e: Entity { name : $entity }) ON CREATE SET e. url = $url , e. summary = $summary SET e. category = coalesce (e. category , $category ) Normalization RE After adding the entities, we processed each relationship defined in the knowledge base. We ensured that both entities involved in Entity Filtration Generate Graph the relationship were present in the database and then created a relationship between the entities using the ’MERGE’ command if it didn’t already exist. Analysis in Neo4j Visualize Graph in Neo4j MATCH ( head : Entity { name : ' EntityName1 '}) , Figure 1: Workflow ( tail : Entity { name : ' EntityName2 '}) MERGE ( head ) -[ r: RELATIONSHIP_TYPE ] - >( tail ) Text pre-processing involves segmenting the text into manageable spans, with span length defining the number of words in each segment and overlap length ensuring coherence between consec-The knowledge graph is visualized in Neo4j to provide an inutive spans. The length penalty manages the impact of longer se-tuitive and interactive representation of the extracted knowledge. quences, while the number of beams allows simultaneous explo-Using Cypher queries, users can explore the graph, examine reration of multiple sequences to find the best one. The number of lationships, and derive insights from the interconnected data. To returns specifies how many sequences are returned after the beam display the graph in the Neo4j application, we use the following search. Cypher query. NEE identifies and classifies entities by tokenizing the text, with each token corresponding to a unique word ID. The Rebel Large MATCH (n ) -[ r ] - >( m) RETURN n , r , m Model, a sequence-to-sequence model based on the T5 architecture, is employed for RE tasks [7]. This model leverages deep learning techniques to process up to 512 tokens as input and generates This query retrieves all nodes (n, m) and the relationships (r) triplets consisting of a subject (head), object (tail), and the rela-between them, displaying the graph structure in the Neo4j interface. tionship type between them. Using a transformer-based encoder-The directed edges in the graph illustrate the relationships, provid-decoder architecture, these triplets are extracted from textual spans, ing a clear visual representation of the underlying knowledge. where each relationship is first predicted in token form and then 2.2 Knowledge Graph Analysis in Neo4j decoded into text. Based on the previous paper, this paper proposes improvements After constructing the knowledge graph in Neo4j, various metrics in normalization and entity filtration, as follows. Entity names are and analyses were applied to explore the structure within the graph. first standardized by converting text to lowercase and removing One such metric is the Total Neighbors score, which measures the common prefixes, followed by verification via Wikipedia’s API. closeness of nodes by counting their unique neighbors. It is based Non-contributive entities, such as dates or overly generic terms, on the idea that a highly connected node is more likely to gain new are identified and excluded using pattern recognition and catego-links. The metric is calculated using the following formula: rization techniques. The system checks for duplicates or highly 𝑇 𝑁 (𝑥,𝑦) = |𝑁 (𝑥) ∪ 𝑁 (𝑦)|, (1) 12 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Integration of Named Entity Extraction Based on Deep Learning for Neo4j Graph Database where N(x) and N(y) represent the sets of nodes adjacent to x and linearity is confirmed by a regression analysis, which yields an 𝑅2 y, respectively. The Total Neighbors score measures the closeness value of 0.9984, indicating an almost perfect fit. of two nodes based on the number of unique neighbors they have. If a score is equal to 0 it indicates no closeness between the nodes, Influence of text length on graph generation time while higher scores indicate greater closeness [9]. 270 The gds.alpha.linkprediction.totalNeighbors function 240 Data Points from the Neo4j Graph Data Science (GDS) library calculates the Regression Line 210 total neighbors score between the two matched nodes (p1 and p2). 180 3 Results [s] 150 The analysis was conducted using the text about Pablo Escobar 120 Time from Wikipedia. For this paper, the original text was divided into 90 sections of varying lengths to examine how text length influences 60 the analysis results. Figure 2 displays a generated graph with a text 30 length of 304 words. The graph was created using specific parameters that influence its structure and content. These parameters were 00 300 600 900 1,200 1,500 1,800 heuristically determined to be span length = 30, length penalty Number of Words = 0, number of beams = 5, number of returns = 2, and overlap Figure 3: Influence of text length on execution time. length = 10. On the same set of parameters we measured processing time, similarity score based on total neighbors, and analyzed graph growth. Figure 4 demonstrates the influence of text length on the number of recognized entities. As the number of words in the text increases, there is a corresponding increase in both the number of nodes shown and the number of entities that are recognized but not displayed. This pattern indicates that longer texts result in the recognition of more entities, although not all are displayed. located in the Liberal party administrative Madellin The decision to display or exclude entities is determined by several territorial entity 2 December 1993 processes designed to maintain the clarity and relevance of the graph. These processes include the combination and unification of twinned member of Rionegro administrative body political party similar entities, the removal of isolated entities, and the filtering date of death out of date-related entities. These processes are essential for main-1 December Illegal drug 1949 taining the graph’s relevance and clarity, preventing clutter from field of work trade date of birth Pablo redundant or less significant entities. occupation member of Escobar founded by Comparison of Recognized Entities vs. Visualized Entities in Graph Based on Text Length occupation Madellin Politician Cartel educated at 200 Universidad Drug lord product or Autonoma material produced Bolivia Latinoamericana country of 150 citizenship Entities Cocaine shares border with ofer 100 Ecuador shares border with manufacturer procuct or material produced Colombia Numb shares Drug cartel 50 Massacre border with diplomatic relation 0 Subclass of Peru 28 81 142 244 304 398 475 614 712 9541152159317991906 Number of Words in Text Murder United States Number of Recognised Entities in Text Number of Entities Shown in Graph Figure 4: Comparison of recognized and visualized entities Figure 2: Generated graph with text length = 304 words based on text length. The following analyses represent the average similarity score for In Figure 3, the time required for graph generation is shown all possible pairs of entities (n1, n2) in the graph. For that is used a to increase linearly with the number of words in the text. This link prediction algorithm (totalNeighbors) to assess how connected 13 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Lea Roj, Štefan Kohek, Aleksander Pur, and Niko Lukač two entities are based on their neighbors. The purpose of this is to Only 10 Highest Scores vs. Text Length measure how interconnected the entities are throughout the entire 30 graph. High scores generally appear between nodes directly related through historical, contextual, or thematic associations. On the other hand dates provide low similarity scores with entities, likely e 20 indicating less direct connection or relevance to these specific dates in the dataset. Scor In Figure 5, the average Total Neighbors score between all nodes is represented. The values are relatively stable, mostly ranging Similarity 10 between 1.5 and 2.0. This suggests a moderate level of similarity between entities across different text lengths, without extreme variation. This stability suggests that the entities within each text maintain a consistent level of connectivity, regardless of text length. 00 500 1,000 1,500 2,000 Text Length Average Similarity Score vs. Text Length 3 Figure 6: Average of ten highest Total Neighbors Scores 2 stability of the average similarity scores across various text lengths .5 e suggests a consistent level of connectivity among entities, with a Scor noticeable increase in the strength of connections in longer texts. 2 This supports the hypothesis that longer texts offer more opportunities for entity interconnections, which is crucial for tasks requiring Similarity comprehensive data analysis and decision-making. 1.5 In conclusion, integrating NEE with graph databases presents a promising approach for transforming unstructured text into structured knowledge. However, the complexity introduced by varying 10 500 1,000 1,500 2,000 text lengths must be carefully managed to optimize both the per-Text Length formance and the utility of the resulting knowledge graphs. Future work could focus on exploring other NEE methods to further en-Figure 5: Average Total Neighbors Score between all nodes hance the efficacy of this integration. Figure 6 highlights only ten strongest connections in the graph, Acknowledgments which is particularly useful for identifying the most significant or central entities. Score increases with text length, particularly The authors acknowledge the support of the STALITA project, noticeable in texts longer than 900 words. This indicates that longer financed by Ministry of the Interior, Slovenia. texts tend to have more instances of highly interconnected nodes. This is due to the increased probability of recurring entities in References longer texts, which leads to more common neighbors. The text [1] Ing Michal Konkol. Named entity recognition. Pilsen: PhD thesis, University of West Bohemia, 2015. with the shortest length (28 words) has the lowest similarity score [2] Lea Roj, Štefan Kohek, Aleksander Pur, and Niko Lukač. Sensitivity analysis of (3.2). This suggest that very short texts lack sufficient content to named entity extraction based on deep learning. establish strong connections between entities. The highest scores [3] Renzo Angles, Marcelo Arenas, Pablo Barceló, Aidan Hogan, Juan Reutter, and Domagoj Vrgoč. Foundations of modern query languages for graph databases. for both average and top ten similarities occur in the longest texts ACM Comput. Surv., 50(5), sep 2017. (1593, 1799, 1906 words). This supports the idea that more extensive [4] Pin Ni, Ramin Okhrati, Steven Guan, and Victor Chang. Knowledge graph and deep learning-based text-to-graphql model for intelligent medical consultation content provides more opportunities for entities to connect or relate. chatbot. Information Systems Frontiers, 26(1):137–156, 2024. [5] Runyu Fan, Lizhe Wang, Jining Yan, Weijing Song, Yingqian Zhu, and Xiaodao 4 Conclusion Chen. Deep learning-based named entity recognition and knowledge graph construction for geological hazards. ISPRS International Journal of Geo-Information, The results demonstrate that text length significantly impacts the 9(1), 2020. performance and outcomes of NEE within Neo4j using deep learn- [6] Shikha Chaudhary, Hirenkumar Vyas, Naveen Arora, and Sejal D’Mello. Graph-based named entity information retrieval from news articles using neo4j. In 2024 ing techniques. As text length increases, so does the processing time 11th International Conference on Computing for Sustainable Global Development for graph visualization, due to the need to extract and manage a (INDIACom), pages 320–324, 2024. larger number of entities and relationships. Moreover, the analysis [7] Pere-Lluís Huguet Cabot and Roberto Navigli. Rebel: Relation extraction by end-to-end language generation. In Findings of the Association for Computational of graph structure revealed that longer texts tend to produce more Linguistics: EMNLP 2021, pages 2370–2381, 2021. nodes, both displayed and recognized but not shown. This suggests [8] Martin Josifoski Sebastian Riedel Luke Zettlemoyer Ledell Wu, Fabio Petroni. Zero-shot entity linking with dense entity retrieval. In EMNLP, 2020. that while longer texts provide more data, they also introduce chal- [9] Neo4j. Total Neighbors Algorithm, 2024. Accessed: 2024-06-25. lenges in managing graph complexity, which complicates graph management and requires the consolidation of similar entities and filtering of less relevant ones to maintain clarity. Furthermore the 14 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Efficient Implementation of Spreadsheet User Application Tjaša Repič Aljaž Jeromel Sašo Piskar tjasa.repic@student.um.si aljaz.jeromel@um.si saso.piskar@dewesoft.com University of Maribor, University of Maribor, DEWESoft d.o.o., Faculty of Electrical Faculty of Electrical Trbovlje, Slovenia Engineering and Computer Science, Engineering and Computer Science, Maribor, Slovenia Maribor, Slovenia Domen Dolanc Niko Lukač domen.dolanc@dewesoft.com niko.lukac@um.si DEWESoft d.o.o., University of Maribor, Trbovlje, Slovenia Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia ABSTRACT tools, and a rich library of functions. Robust data management Processing measurement data is fundamental in the field of high-capabilities are also crucial, including import/export options, data tech instrumentation, where precision, collection, analysis, and validation, and integration with other tools [3, 4]. visualization of data are of great importance. Extensive amounts Within the initial design phase of the proposed Spreadsheet of data ought to be displayed and processed to ensure smooth user plugin solution, a crucial aspect of planning involved acquiring a experience. Tabular displays are therefore common, being more deeper understanding of the DewesoftX software for which the plu-comprehensible for the average user. In this paper we propose a so-gin development was intended [5]. Processing measurement data lution, envisioned by the company Dewesoft - a spreadsheet editor is a crucial aspect of the advanced test and measurement indus-widget tailored for their data acquisition software DewesoftX, also try, where the company Dewesoft operates [6]. A key component compatible with separate plugins within the software. Since using of Dewesoft’s offering is the DewesoftX software. Many fields in commercially widespread tools to do so often results in setbacks science, commerce and the like require precise measurement, data when seeking to integrate those within existing software, we’ve de-collection, analysis and visualization. Handling such large volumes veloped an application functionally comparable to other solutions of data can prove challenging and inefficient, reducing productivity, while complying with the company’s existing software standards. convenience and increasing the risk of making mistakes. The software in question has been designed specifically to solve this issue, KEYWORDS being used across multiple industrial and commercial sectors. It spreadsheet, tabular data, optimisation, user experience, data visu-supports a wide range of interfaces for data visualization, allowing alisation. for the synchronized acquisition of data from nearly any analog sensor, storage, and visualization within the same file. When discussing tools designed to display tabular data, it is 1 INTRODUCTION essential to consider mathability. Mathability in spreadsheet tools We widely adopt spreadsheets for their familiarity and versatil-refers to their capacity to perform complex mathematical operations ity, offering extensive features for data manipulation, statistical with efficiency and accuracy. This capability is crucial as it ensures calculations, data collection, and visualization. Their accessibility precise calculations, boosts productivity, and accommodates various makes them a preferred choice for both individuals and businesses. applications across fields such as finance, engineering, and data However, spreadsheets also have notable drawbacks. They are sus-science [7]. ceptible to various input errors, including clerical mistakes, rule Our solution enhances data handling and provides clearer visu-violations, data-entry errors, and formula errors, which can signifi-alization, prioritizing environments where precise, synchronized cantly distort the data. Additionally, spreadsheets are not inherently data acquisition is critical, such as the software itself, DewesoftX. designed for efficient data storage or seamless connectivity to rela-It is meant to integrate the spreadsheet tool within the software, tional databases, posing challenges in effective data management therefore offering more advanced data handling, synchronization and retention [1, 2]. and visualization capabilities tailored to the user’s needs while In modern spreadsheet tools, providing a clear and efficient user avoiding potential issues with safety, space and integration that experience involves several essential elements. These include an arise from using already existing tools. This paper presents its basic intuitive interface with a clean layout, consistent design, tool tips, functions and the thought process behind their implementation, and help guides. User-friendly navigation is achieved through a providing detailed explanations, as well as the results of duration well-organized toolbar, robust search functionality, and keyboard and memory usage of various supported functionalities. shortcuts. Data visualization and formatting are enhanced by features like conditional formatting, and predefined styles. Comprehensive formula support includes auto-complete, error-checking DOI: https://doi.org/10.18690/um.feri.6.2024.4 ISBN 978-961-286-914-4 15 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Tjaša Repič, Aljaž Jeromel, Sašo Piskar, Domen Dolanc, and Niko Lukač 2 METHODOLOGY with the x value corresponding to the column and the y value The purpose of the following subsections is to provide a breakdown, corresponding to the row. The row index masks the lower 16 bits of as well as a thorough explanation of the implementation process of the integer and shifts them 16 bits to the left. The column index is the spreadsheet user application. masked in the same way and remains unshifted. Using the bitwise OR operation, the two 16-bit values are combined into a single 2.1 Fundamental Features 32-bit integer. Implementation of a hash map designed to store the 32-bit value grants us a key for each cell, allowing us to insert 2.1.1 Spreadsheet Widget Layout. To enhance data accessibility during development and make the layout overview clearer, the and update data within the cell’s index by combining characters widget workspace was divided into three parts. The visual widget received through user input into coherent values. is segmented into two primary regions. The first is the context Where 𝑦 is the row index, 𝑥 is the column index, & represents menu, containing various button shortcuts for features that will the bitwise AND operation, << represents the bitwise left shift be discussed further in this article. The software’s user interface operation, | represents the bitwise OR operation, and 0𝑥𝐹𝐹𝐹𝐹 is was originally developed in Delphi, using its own VCL (Visual a hexadecimal constant representing the lower bits, the hash key Component Library) [8]. VCL is built on the Win32 architecture, formula can be expressed as: sharing a similar structure but offering much simpler usage [9]. ℎ𝑎𝑠ℎ_𝑘𝑒𝑦 = ((𝑦&0𝑥𝐹 𝐹 𝐹 𝐹 ) ≪ 16) | (𝑥&0𝑥𝐹 𝐹 𝐹 𝐹 ) (1) The second region, referred to as the spreadsheet rectangle or The hashmap provides 𝑂 "TableRect", encompasses all data and its layout within the widget, (1) time complexity for retrieving data from the selected cell, which is highly desirable, as it means that including information about cells, columns, rows and the spread-the time required to perform an operation is constant and does not sheet title. A smaller section called the data rectangle or "DataRect" depend on the size of the data set. additionally handles cell information. The context menu and the The Spreadsheet widget also supports cell splitting and merging. spreadsheet workspace can be seen on Figure 1. Selected cells can be merged into a larger cell, which then behaves We also provide users with the option to save data for future use. like a standard cell. To facilitate this functionality, cells include an This is done by writing cell values, styles, merged cell information, additional parameter, "merged to", which records the cell to which resized columns and manually adjusted titles to a custom DewesoftX they are merged. By default, for cells that are not merged, this file format for saving the workspace which will further be referred parameter is set to -1. to as .DXS or setup file. The data can then be read from the setup A dedicated class manages cell selection within the spreadsheet, file after loading the workspace at a later time. defining it by specifying the starting and ending column and row indexes. This enables users to efficiently apply operations to a range of cells, rather than being limited to individual cells. 2.2 Spreadsheet Formatting Allowing users to stylize components in a user application is important for several reasons, including enabling customization to tailor the application’s appearance to the user’s individual preferences and allowing them to highlight important information and organize content according to their needs. For this purpose, we have incorporated various styling features into the spreadsheet user application. 2.2.1 Spreadsheet Stylizing. To enhance customization, we introduced a structure within the Style class containing eighteen properties applicable to all spreadsheet components, including cells, selections, rows, and columns. Sixteen of these properties handle styling aspects such as font family, size, color, cell background, bold, Figure 1: Separation between the context menu (blue) and italic, underline, border properties, and text alignment. Only user-area within which the spreadsheets’s data is displayed defined values are saved, optimizing file size and reading/writing (green). speeds. The remaining two properties include a property mask, assigning a binary value to each style feature, and a vector of integers to prioritize styles across cells, rows, and columns, ensuring correct 2.1.2 Cell Manipulation. Initially we have made an effort of defining essential functionalities that define the main purpose and value application when styles intersect. of the application by dismantling related spreadsheet editors [10– 2.2.2 Resizing Columns and Rows. We further enhanced spread-12]. The most fundamental feature of the spreadsheet grid is the sheet customization by implementing the functionality for users to ability to insert and update data within the cells. resize columns according to their specific needs. To resize a column, In order to grant each cell its own unique value we implemented the user must trigger a mouse down event on the right edge of any a hash function, which generates a hash value from two integer column. This interaction detects the user’s intention to change the inputs. These are determined by the cell’s position within the grid, column width and memorizes the index it has been detected on, 16 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Efficient Implementation of Spreadsheet User Application then determines the mouse movement according to the following visible state of the spreadsheet is updated to reflect these changes. equation: Data: Action History Result: Undo or Redo Action 1 UndoOrRedo(isRedo) if no actions available then 𝑚𝑜𝑣𝑒𝑋 = 𝑥 − 𝑂 [𝑟 − 1] − 𝐷 2 return; 𝑥 , (2) 3 end 4 get action, adjust counter; 5 if action is Insert/Update then where 𝑚𝑜𝑣𝑒𝑋 represents the horizontal distance the mouse has 6 set cell, edit value; moved during the column resizing operation, 𝑥 is the current hori-7 else if action is SetStyle then zontal position of the mouse cursor. The offsets array 𝑂 holds the 8 apply styles; horizontal positions of the left edges of each column that is cur-9 else if action is Resize then rently displayed in the spreadsheet, while 𝑟 represents the selected 10 apply size changes; index intended to be resized. Lastly, 𝐷 11 else if action is Merge/Split then 𝑥 represents the x-coordinate of the data rectangle’s origin. This value is subtracted to ensure the 12 update merge states; calculation is relative to the data area. 13 else if action is TextPaste then We proceed with the operation by applying the following equa-14 apply text; tion: 15 else if action is Sort then 16 apply sorting; 17 else if action is Paste then 18 apply data and styles; 𝑚𝑜𝑣𝑒𝑋 𝑟𝑒𝑠𝑖𝑧𝑒𝑅𝑎𝑡𝑖𝑜 = max , 0.1 (3) 19 end 𝑐𝑒𝑙𝑙𝑊 𝑖𝑑𝑡ℎ 20 update visible cells; Algorithm 1: Undo/Redo algorithm. The equation calculates the ratio of the mouse movement to the cell 3 RESULTS width. It then ensures that this ratio is not less than 0.1 by using the The measurements leading to the results presented in this paper max function. The calculated ratio is then set as the new column were conducted on a system equipped with an AMD Ryzen 9 3900x width on the resize index. Additionally, note that the cell width is 12-Core Processor and 64GB of RAM. It is also important to note the default width of the cells, which is calculated based on the font that DewesoftX version 2024.2 was used when conducting these size settings. measurements. For the testing, we evaluated three spreadsheet functionalities across three progressively larger cell selections. The functionalities 2.3 Undoing and Redoing Spreadsheet Actions tested included pasting values into cells (see Table 1), undoing and Within the context of the spreadsheet application, we defined the redoing font colour changes (see Table 2), and loading from and undo and redo functionality as a state machine capable of switching saving to the setup file with the font colour state being set (see between the current and previous states after applying a change to Table 3 and Table 4). The cell selection ranges used for these tests the spreadsheet and calling one of said operations. were 5x5, 25x25, and 50x50. To track state changes, we have designed and implemented the The data used in these experiments comprised text, numeric "TableAction" structure. Different state changes require modifica-values, and dates, with font colour applied as specified. Each evalu-tions to various types of data. The TableAction structure simplifies ation was conducted ten times under identical conditions, and the the process by encapsulating the type of action triggered along with results were averaged to ensure accuracy. parameters necessary for adjustment. We have provided detailed definitions of various Spreadsheet actions as shown in 1. Table 1: Evaluation of Cell Value Pasting. The spreadsheet actions are managed within the respective undo and redo vectors whose maximum size is set to twenty actions. Cell Selection Range: Memory Usage [MB]: Duration [ms]: Upon triggering an add action event, the initial state prior to the change is recorded in the undo vector, while the post-change state, 5x5 4.4 0.812 along with its corresponding action type, is recorded in the redo 25x25 16.1 13.238 vector. When the user initiates an undo or redo event, either via the 50x50 44.1 50.179 context menu or keyboard shortcuts, the Algorithm 1 is executed. To summarize, the algorithm begins by verifying the feasibility Within Table 1 the data shows that memory usage increases of the state change, ensuring that there is at least one recorded significantly with the size of the cell selection, from 4.4 MB for 5x5 action in the action counter. If this condition is met, the action to 44.1 MB for 50x50. The duration also increases with the size of the counter is adjusted appropriately. The algorithm then executes cell selection, from 0.812 ms for 5x5 to 50.179 ms for 50x50. A larger the necessary statements based on the type of action, ensuring selection is expected to be more memory-intensive than its smaller that the corresponding data is modified accordingly. Finally, the counterpart. Interestingly, a selection 100 times larger is only 10 17 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Tjaša Repič, Aljaž Jeromel, Sašo Piskar, Domen Dolanc, and Niko Lukač times more memory-intensive. This can be explained by the fact all functionalities. Pasting values and saving to setup are particu-that the memory required for the basic widget to display correctly larly resource-intensive, whereas undoing and redoing font colour is also involved within the measurement and is independent of the changes show a moderate increase in resource requirements. Load-amount of data stored in the spreadsheet. It’s also worth noting ing from setup shows a notable increase in duration with larger that the duration does not increase linearly. selections, highlighting the complexity of handling larger datasets. Table 2: Evaluation of Undoing/Redoing the Font Colour 4 CONCLUSION Property State. In this paper, we proposed a solution for handling measurement data in high-tech instrumentation through a computationally efficient Cell Selection Range: Memory Usage [MB]: Duration [ms]: spreadsheet editor widget. This widget is tailored for integration with Dewesoft’s data acquisition software, DewesoftX, aiming to 5x5 0.134 0.292 offer functionality comparable to commercial spreadsheet tools 25x25 0.722 1.157 while ensuring compatibility with existing software standards. The 50x50 1.5 5.766 solution focuses on enhancing data accessibility and user experience by providing an intuitive interface, robust navigation, and Within Table 2, memory usage increases modestly with larger comprehensive formatting and state manipulation features. cell selections, from 0.134 MB for 5x5 to 1.5 MB for 50x50. The dura-In future development, we aim to enhance advanced matha-tion also increases, from 0.292 ms for 5x5 to 5.766 ms for 50x50. The bility features essential for an efficient spreadsheet application. memory and time required to undo/redo font colour changes grow Specifically, we plan to implement a formula system within the as the cell selection range expands. Comparing the measurements widget, enabling users to input formulas and perform calculations of the undo/redo function with the paste-into-cells function, we directly within the spreadsheet. Additionally, we intend to incor-can conclude that the memory usage for the former is greater than porate conditional cell formatting, which automatically changes that for the latter. the appearance of cells based on their content to improve data visualization and analysis. We will also continue refining the exist-Table 3: Evaluation of Loading from Setup where Font Colour ing features, taking user feedback into consideration to ensure the Property is set. highest quality user experience possible. ACKNOWLEDGMENTS Cell Selection Range: Memory Usage [MB]: Duration [ms]: We are deeply thankful to Dewesoft and its representatives for their 5x5 97.8 1.769 collaboration on this project, without which this paper would not 25x25 117.6 33.173 have been possible. 50x50 123.6 132.758 REFERENCES For Table 3 memory usage increases with larger cell selections, [1] F. Nurdiantoro, Y. Asnar, and T. E. Widagdo. The development of data collection from 97.8 MB for 5x5 to 123.6 MB for 50x50. The duration, however, tool on spreadsheet format. Proceedings of 2017 International Conference on Data and Software Engineering, ICoDSE 2017, 2018-January:1–6, Jul. 2017. shows an increase from 1.769 ms for 5x5 to 132.758 ms for 50x50, [2] Srideep Chatterjee, Nithin Reddy Gopidi, Ravi Chandra Kyasa, and indicating that loading from setup becomes more time-consuming Prakash Prashanth Ravi. Evaluation of open source tools and develop- ment platforms for data analysis in engine development. SAE Technical Papers, with larger selections. pages 1–11, Jan. 2015. [3] Sabine Hipfl. Using layout information for spreadsheet visualization. Proceedings Table 4: Evaluation of Saving to Setup where Font Colour of EuSpRIG 2004 Conference Risk Reduction in End User Computing: Best Practice for Spreadsheet Users in the New Europe, pages 1–13, 2004. Property is set. [4] Bernard Liengme. A Guide to Microsoft Excel 2013 for Scientists and Engineers. Academic Press, London, United Kingdom, 2013. [5] Dewesoft. Introduction | Dewesoft X Manual EN. https://manual.dewesoft.com/ Cell Selection Range: Memory Usage [MB]: Duration [ms]: x/introduction, 2024. Accessed: 18-08-2024. [6] Dewesoft. DewesoftX Award-Winning Data Acquisition and Digital Signal 5x5 0.234 3.752 Processing Software. https://dewesoft.com/, 2024. Accessed: 21-08-2024. 25x25 0.293 66.618 [7] P. Biro and M. Csernoch. The mathability of spreadsheet tools. 6th IEEE Conference on Cognitive Infocommunications, CogInfoCom 2015 - Proceedings, pages 50x50 0.356 260.720 105–110, Jan. 2016. [8] Embarcadero Technologies. VCL Overview - RAD Studio. https://docwiki. embarcadero.com/RADStudio/Sydney/en/VCL_Overview, 2024. Accessed: 18-Lastly for Table 4, memory usage shows a slight increase with 08-2024. [9] Thomas Lauer. Porting to Win32™: A Guide to Making Your Applications Ready larger cell selections, from 0.234 MB for 5x5 to 0.356 MB for 50x50. for the 32-Bit Future of Windows™. Springer-Verlag New York Inc., New York, The duration increases significantly with the size of the cell selec-NY, USA, 1996. tion, from 3.752 ms for 5x5 to 260.720 ms for 50x50. This indicates [10] Isaac Alejo. Google Sheets Tutorial Guide. Google Books, Online, 2024. Accessed: 18-08-2024. that saving to setup is considerably more time-consuming as the [11] LibreOffice Documentation Team. LibreOffice 4.1 Calc Guide. Google Books, cell selection size grows. Online, 2024. Accessed: 08-08-2024. [12] Karl Mernagh and Kevin Mc Daid. Google sheets vs microsoft excel: A compari-As expected, the results indicate that both memory usage and son of the behaviour and performance of spreadsheet users. Proceedings of the duration generally increase with larger cell selection ranges across Psychology of Programming Interest Group (PPIG) 2014 Conference, 2014. 18 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Improvement and Evaluation of a Heuristic Method for the Minimal Feedback Arc Set Problem Jure Pustoslemšek Ema Črne Nejc Rihter jp76466@student.uni-lj.si ec51731@student.uni-lj.si nr5256@student.uni-lj.si University of Ljubljana, University of Ljubljana, University of Ljubljana, Faculty of Computer and Faculty of Computer and Faculty of Computer and Information Science, Information Science, Information Science, Ljubljana, Slovenia Ljubljana, Slovenia Ljubljana, Slovenia ABSTRACT collecting partial solutions into a single list to form a full solution. This paper addresses the problem of finding minimal feedback arc For the remainder of the algorithm, we assume that 𝐺 is strongly sets in directed graphs, a critical issue in various domains such connected. Two empty lists 𝐸𝑓 and 𝐸𝑏 are initialized to hold the as computational biology, scheduling and network analysis. We forward and backward edges respectively. Vertices are then ordered implement, analyse and improve a novel heuristic approach. Our according to some rule. For each vertex, we identify the forward improved method reuses their heuristic method for reducing solu-edges and add them to 𝐸𝑓 , thereby removing these edges from the tion size and uses other established techniques from both exact and graph. Once the graph becomes acyclic during this process, iteration approximate algorithms to speed up the algorithm. The implemen-stops. We then iterate in reverse order to identify the backward tation makes use of a fast network analysis library for additional edges, adding them to 𝐸𝑏 and removing them from 𝐺. Once the speed-up. graph becomes acyclic, iteration stops. To improve the solution, we apply Algorithm 1, also called smart-KEYWORDS AE [3] to the smaller of the two sets, 𝐸𝑓 or 𝐸𝑏. This key component Directed graphs, minimum feedback arc set, NP-hard problems, aims to reduce the size of the found FAS by reintroducing the combinatorial optimization, heuristic methods removed edges in a balanced way while avoiding creating cycles in the graph. 1 INTRODUCTION Directed graphs are a fundamental tool in network theory. They are Input: Graph 𝐺, edge list 𝐹 widely used to model systems where the direction of relationships Output: 𝐴𝐸 between entities is crucial. A feedback arc set ( 1 𝐴𝐸 abbr. FAS) is a set ← [ ]; of edges in a directed graph such that removing those edges from 2 while 𝐹 is not empty do the graph makes it acyclic. While the entire set of vertices 𝑉 can 3 𝑃𝐸 ← [ ]; trivially serve as a feedback arc set, the challenge lies in finding a 4 𝑐𝑜𝑢𝑛𝑡 ← 0; minimal feedback arc set, i.e. a feedback arc set with the smallest 5 𝑖 ← 0; number of edges. 6 while 𝑖 + 𝑐𝑜𝑢𝑛𝑡 < |𝑉 (𝐺)| do The minimal FAS problem has various real-world applications; 7 𝑒 ← 𝐹 [𝑖 + 𝑐𝑜𝑢𝑛𝑡]; for instance, in social network analysis, FAS is crucial for mis-8 Add 𝑒 to 𝑃𝐸; information removal and label propagation [2], it also plays an 9 Add 𝑒 to 𝐺; important role in computational biology and neuroscience [7], and 10 if 𝐺 is acyclic then task scheduling by breaking cyclic dependencies. 11 Add 𝑒 to 𝐴𝐸; The problem of finding a minimal FAS is NP-hard. The decisional 12 𝑐𝑜𝑢𝑛𝑡 ← 𝑐𝑜𝑢𝑛𝑡 + 1; version of the problem, that is finding a FAS of a certain size, is one 13 else of the first known NP-complete problems [8]. As such, it is believed 14 Remove 𝑒 from 𝐺; that there exists no algorithm that can solve it in polynomial time. Additionally, the problem is also very challenging to approximate. 15 end There is no known algorithm with a constant bound on the approx-16 𝑖 ← 𝑖 + 1; imation ratio, making it an APX-hard [6] problem. The problem 17 end is complementary to the maximum acyclic subgraph problem and 18 Remove all of 𝑃𝐸 from 𝐹; there is a natural reduction to the linear arrangement problem [3]. 19 end In this work, we focus on the implementation and improvement 20 return 𝐴𝐸 of a heuristic algorithm for finding a minimal FAS, as described by Algorithm 1: The smartAE heuristic Cavallaro et al. [3]. We shall refer to this algorithm as the original algorithm. The algorithm begins by identifying strongly connected The smartAE heuristic begins with an acyclic graph 𝐺 and a components (abbr. SCC) in the input graph 𝐺. Since any cycle is list of edges 𝐹, which were removed from 𝐺. An empty list 𝐴𝐸 is guaranteed to belong to a single SCC [4], we are able to split 𝐺 into initialized to store edges that can be successfully reintroduced into SCCs and run the rest of the algorithm on each SCC separately, the graph without creating a cycle. We iterate through all of the DOI: https://doi.org/10.18690/um.feri.6.2024.5 ISBN 978-961-286-914-4 19 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Jure Pustoslemšek, Ema Črne, and Nejc Rihter edges in 𝐹, where instead of sequentially reintroducing each edge 2.2 Vertex ordering strategies from 𝐹, the algorithm employs a counter 𝑐𝑜𝑢𝑛𝑡 to strategically skip The original algorithm uses four vertex orderings in their experi-over edges. This approach ensures that we reinsert edges from as ments: in-degree and out-degree, both in increasing and decreasing many vertices as possible. During each iteration, a temporary list order. We adopt all these orderings and add five more orderings. 𝑃𝐸 tracks the edges being tested. If adding an edge does not create The first two are based on the difference of the in-degree (𝑑− (𝑣)) a cycle, it is added to 𝐴𝐸 and 𝑐𝑜𝑢𝑛𝑡 is incremented. Overall time and out-degree (𝑑+(𝑣)) and two more are based on the difference of complexity of the entire algorithm is O(|𝐸|(|𝑉 | + |𝐸|)). their degrees and their ratio. We also included a random ordering for comparison. 2 IMPROVEMENTS degdiff (1) In this section, we outline our improvements of the original algo- (𝑣) = max 𝑑+(𝑣) − 𝑑− (𝑣), 𝑑− (𝑣) − 𝑑+(𝑣) rithm. Improvements are grouped into four categories: size reduc-degratio 𝑑 − (𝑣) 𝑑+(𝑣) (𝑣) = max , (2) tion, vertex ordering strategies, forward/backward edge removal 𝑑+ (𝑣) 𝑑− (𝑣) and acyclicity checks. 2.3 Forward/backward edge removal The forward edge removal phase proceeds as follows. For each ver-2.1 Size reduction tex 𝑣 in 𝐺, ordered according to the chosen ordering, we remove all To reduce the size of the input graph 𝐺, we apply reduction rules edges exiting 𝑣 and entering vertices that follow 𝑣 in the ordering, based on a more general method by Baharev et al. [1]. The general then we check if 𝐺 has become acyclic. If 𝐺 is acyclic, we end the method removes edges inside and on the boundary of an induced edge removal phase. In the worst case, we remove forward edges of subgraph 𝐻 with the following property: the size of a minimal FAS every vertex, at which point 𝐺 is guaranteed to be acyclic, since the equals the upper bound of the size of the smallest edge set 𝐹 whose current ordering has become a topological ordering. The backward removal breaks all cycles in 𝐺 with vertices in 𝐻. In this case, we edge removal phase is almost identical, the only difference being remove edges in 𝐹 from 𝐺. We implemented a few straightforward that it removes edges that precede 𝑣 in the ordering. The original rules for eliminating some simple and common patterns - we applied algorithm chooses the smaller of 𝐸𝑓 and 𝐸𝑏 and only performs smar-the following rules in rounds until a round produces no further tAE on that. We challenged this approach and performed smartAE reduction in graph size. on both, only then taking the smaller as the solution. The purpose of the proposed improvement is to calculate an (1) For every self-loop 𝑒, i.e. an edge with the same entering SCC decomposition before forward/backward edges of a vertex are and exiting vertex, add 𝑒 to the solution. removed. Then, instead of adding all edges to the solution, we skip (2) For every directed path 𝑢𝑣1 . . . 𝑣𝑛𝑤 where 𝑣1, . . . 𝑣𝑛 have edges that exit one SCC and enter another. Such an edge cannot be in-degree 1 and out-degree 1, delete 𝑣1, . . . 𝑣𝑛 and add the a part of a cycle. A cycle would imply that vertices from both SCCs edge 𝑢𝑤 to 𝐺. are reachable from one another, but then all those vertices would be in the same SCC. As before, the edge removal phase terminates u 𝑣1 . . . 𝑣𝑛 w ⇒ u w when 𝐺 becomes acyclic. This improvement is expected to increase the running time but reduce the size of the solution. (3) For every 2-cycle 𝑢𝑣 where 𝑢 has out-degree 1, delete 𝑢 and add 𝑢𝑣 to the solution. 2.4 Acyclicity checks (4) For every 2-cycle 𝑢𝑣 where 𝑢 has in-degree 1, delete 𝑢 and add The most time-consuming aspect of the algorithm is restoration and 𝑣𝑢 to the solution. removal of edges during smartAE process. A standard method for checking acyclicity is to find a topological ordering of the graph’s vertices, i.e. a ordering of vertices such that all out-neighbors of u v u v a vertex 𝑥 appear after 𝑥. If the graph contains a cycle, then it does not have a topological ordering [4]. This ordering provides an efficient way to check if inserting an edge would create a cycle. If it’s a backward edge, i.e. it ends in a vertex that comes before its Having reduced the graph using the rules described above, we starting vertex, it forms cycles. computed the strongly connected components (SCCs) as in the During smartAE we check acyclicity after restoring each edge. original algorithm. However, after computing the SCCs, we ap-If the resulting graph has cycles, we immediately remove that edge, plied a technique described by Park and Akers [10], in which each making the graph acyclic again. We avoid many calculations of a SCC is further divided into its biconnected components. This effi-new topological ordering by exploiting this sequence of operations. ciently breaks up the graph into even smaller strongly connected Right before the use of smartAE at the end of forward/backward subgraphs. As biconnected components are traditionally defined edge removal, an acyclic check is performed. During the check, we for undirected graphs, we treat the SCCs as undirected to compute calculate a topological ordering and, if successful, store it in the these components. The remainder of the algorithm is then applied variable𝑇𝑐. We are then able to use this variable to make subsequent to these biconnected components. Throughout this article, we will acyclicity checks trivial. When we restore an edge, we check if it is refer to these components as the SCCs. a backward edge in the 𝑇𝑐. If it is, the graph is no longer acyclic. We 20 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Improvement and Evaluation of a Heuristic Method for the Minimal Feedback Arc Set Problem move the topological ordering from 𝑇 Table 1: Solution sizes and running times for SNAP instances 𝑐 into a background variable 𝑇𝑝 and unset 𝑇𝑐 . When we remove the last restored edge, we move Instance V-E max-SCC V-E Best result the ordering in 𝑇𝑝 back into 𝑇𝑐. We thus avoid a recalculation in 7115-103689 1300-39456 7966: 14s the next acyclicity check. 6301-20777 2068-9313 531: 15.3s 8114-26013 2624-10776 713: 20.9s 3 EXPERIMENTAL RESULTS 8717-31525 3226-13589 1186: 39s We evaluated our implementation and improvements on the IS-8846-31839 3234-13453 1023: 38.8s CAS circuit benchmark dataset [5] and on a selection of directed 10876-39994 4317-18742 1721: 59.1s networks from the SNAP Large Network Dataset Collection [9]. 22687-54705 5153-17695 1706: 1m37s We experimented with different configurations of the algorithm to 26518-65369 6352-22928 2254: 2m34s assess the effect of each option on both speed and solution size. The 36682-88328 8490-31706 2531: 3m44s ISCAS dataset primarily served as a speed benchmark. The largest 62586-147892 14149-50916 3361: 12m14s instance in this dataset took about 5 to 10 minutes to complete, 75879-508837 32223-443506 141733: 37m53s depending on the configuration; while the implementation of the 265214-418956 34203-151930 61598: 3m31s original algorithm, which used four orderings instead of seven, took 325729-1469679 53968-304685 409862: 53m24s around 90 minutes. This gave us confidence to apply our algorithm 281903-2312497 150532-1576314 313852: 9h43m2s on larger and more diverse networks from the SNAP dataset. 77360-828161 70355-888662 394218: 1h44m37s Our results from testing our implementation on the SNAP dataset 82168-870161 71307-912381 403623: 1h48m43s are presented in Table 1. To illustrate the complexity and size of each graph, the first column lists the number of vertices and edges for each instance, while the second column provides the number surprising. For these instances, smartAE was particularly effective, of vertices and edges in the largest strongly connected component but its effect diminished after reductions. There is one instance for (SCC). The figures in both columns are based on reduced graph. which the algorithm failed to complete within the time limit when The last column presents the best results from our different config-not using reductions, but successfully computed a solution with urations, including the size of the minimal feedback arc set and the configurations that used reductions. time taken to compute it. We implemented the algorithm in Python, using a graph library 3.2 Vertex orderings written in C++. By implementing it this way, the source code is Different vertex orderings bring drastically different results. The relatively easy to understand while also keeping it reasonably fast orderings based on degree difference (1) and ratio (2), as well as and efficient. We tested the implementation on the Arnes computing the random ordering, give consistently worse solutions than those cluster with a 12-hour time limit. Each run gave us solution sizes based on in-degree and out-degree. We should note that when for all vertex orderings. For runs that did not finish within the time comparing apparently analogous orderings, e.g. forward edges of an limit, we examined the sizes of SCCs that were being computed. We ascending ordering and backward edges of a descending ordering, searched for the largest SCC that computed at least one ordering there were minor, but non-zero differences in solution size. Such within the time limit. This provided a rough estimate of the largest orderings were different due to the order of vertices with the same SCC size manageable in a parallel or distributed setting where each score. ordering is processed by two separate threads or nodes, one for The speed of the serial algorithm can be multiplied by a factor forward edge removal and one for backward edge removal. In our of 94 with no effect on solution size, by only using the in-degree experiments this number is between 2.8 and 2.9 million edges. The and out-degree orderings. Speed can be further doubled by only source code and raw result data is available at [11]. computing forward edge removals, but at a cost of slightly worse solution sizes. This gives us a speedup factor of 92 = 4.5. By taking 3.1 Impact of size reduction into account quadratic time complexity, this would theoretically The first option we tested was the use of size reductions. This multiply the upper bound on feasible input size by 1.5, with no reduction has two distinct effects. First, it reduces the size of the impact on solution size or approximately 2.12 with slightly worse input graph. More importantly, the removal of edges and vertices solutions. can lead to a smaller SCCs, effectively shrinking the problem size and greatly reducing running time. Second, reduction rules should 3.3 Impact of the SCC-based modification of help decrease the size of the solution. While we add some removed edge removal edges to the solution, those edges are already guaranteed to be in As described in Section 2.3, we implemented a modified version of the optimal solution. the edge removal phase that is based on an SCC decomposition. The For instances with the largest SCC having more than about running time of the algorithm with this modification was more time 14,000 edges, the algorithm’s running time was shorter when using consuming than the unmodified version. For some instances the reductions, with time savings increasing with input size. We find running time doubled, while for others the impact on running time that for instances above this complexity, the benefits of reductions was less than 10%. The impact on solution size is more complex, outweigh the cost. For a third of instances, the algorithm produced though. If smartAE is not used, the solution is always smaller with smaller solutions when reductions were not used, which is quite the modification, as expected. However, when smartAE is applied, 21 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Jure Pustoslemšek, Ema Črne, and Nejc Rihter the solution can sometimes be larger with the modification. The input graphs, leading to faster processing times and more manage-difference is relatively small in those cases, ranging from 0.04% able SCCs, especially in larger graphs. We further reduced compu-to 0.5%. This finding is somewhat unexpected as we expected this tational complexity by avoiding unnecessary recalculations during method to significantly improve solution size. This and our findings the smartAE process and by splitting SCCs into biconnected com-regarding reduction highlight unpredictability of smartAE. Similarly ponents. The experiments also revealed that selection of vertex to our finding on vertex orderings, one can run the algorithm with ordering has a great impact on both the speed and quality of the so-both edge removal procedures to potentially achieve a slightly lutions. Orderings based on in-degree and out-degree, particularly better solution size at a cost of running time. in descending and ascending orders, consistently outperformed other strategies. The modification covered in Section 3.3 did not 3.4 Impact of smartAE consistently improve performance and, when it did, the benefits The addition of smartAE [3] roughly doubles the running time for were usually insignificant. Additionally, it has led to less predictable all cases without the modified edge removal phase. The improve-results when smartAE was applied. The smartAE heuristic, while ef-ment in solution size, however, is heavily dependent on input size. fective for reducing solution sizes in smaller graphs, showed lesser Generally, larger graphs exhibit smaller improvements compared to returns as graph size increased, raising questions about its efficiency smaller graphs. This is clearly illustrated in Figure 1, which shows and practicality in larger graphs. the percentage of edges selected for the FAS solution, both with However, there is still room for improvement. One is the speedup and without the smartAE procedure. The red portion of the bars described in Section 3.3, but there are also other avenues. An ob-represents the improvement achieved with smartAE. For smaller vious improvement is to implement the algorithm entirely in a graphs, smartAE can reduce the solution size by half, whereas for compiled language like C++, eliminating the significant overhead the largest tested graphs, the improvement is as little as 1%. This of an interpreter. We plan on also adding more complex reduction observation raises questions about the usefulness of smartAE for rules based on Baharev’s method [1], for example by searching for very large graphs. instances of tournaments or other common patterns and reducing based on those. 0.6 ACKNOWLEDGMENTS without smartAE with smartAE We sincerely thank assist. prof. dr. Uroš Čibej for his guidance, advice and for introducing us to this topic. REFERENCES 0.4 [1] Ali Baharev, Hermann Schichl, Arnold Neumaier, and Tobias Achterberg. 2021. (%) An Exact Method for the Minimum Feedback Arc Set Problem. ACM J. Exp. Algorithmics 26, Article 1.4 (apr 2021), 28 pages. https://doi.org/10.1145/3446429 [2] Ceren Budak, Divyakant Agrawal, and Amr El Abbadi. 2011. Limiting the spread of misinformation in social networks. In Proceedings of the 20th International Solution Conference on World Wide Web. Association for Computing Machinery, 665–674. https://doi.org/10.1145/1963405.1963499 0.2 [3] Claudia Cavallaro, Vincenzo Cutello, and Mario Pavone. 2023. Effective Heuristics for Finding Small Minimal Feedback Arc Set Even for Large Graphs. In CEUR Workshop Proceedings. vol. 3606. https://ceur-ws.org/Vol-3606/paper56.pdf [4] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3 ed.). MIT Press. [5] Ali Dasdan. 2004. Experimental analysis of the fastest optimum cycle ratio and mean algorithms. ACM Transactions on Design Automation of Electronic Systems 0 9, 4 (2004), 385–418. https://doi.org/10.1145/1027084.1027085 [6] Guy Even, Joseph Naor, Baruch Schieber, and Leonid Zosin. 1997. Approximating 2077726013315253183939994547056536988328103689147892418956508837828161870161 the minimum feedback arc set in tournaments. In Proceedings of the eighth annual 1469679 2312497 ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Number of edges Mathematics, 464–472. [7] I. Ispolatov and S. Maslov. 2008. Detection of the dominant direction of information flow and feedback links in densely interconnected regulatory networks. BMC Bioinformatics 9 (2008), 424. https://doi.org/10.1186/1471-2105-9-424 Figure 1: Percentage of solution edges compared to all edges [8] Richard M. Karp. 1972. Reducibility among Combinatorial Problems. In Complex-in the graph, with and without the smartAE procedure. ity of Computer Computations, Raymond E. Miller and James W. Thatcher (Eds.). Springer US, Boston, MA, 85–103. https://doi.org/10.1007/978-1-4684-2001-2_9 [9] Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network 4 CONCLUSIONS AND FUTURE WORK Dataset Collection. http://snap.stanford.edu/data. [10] S. Park and S.B. Akers. 1992. An efficient method for finding a minimal feedback The heuristic algorithm presented in this paper has proven effective arc set in directed graphs. In [Proceedings] 1992 IEEE International Symposium on Circuits and Systems, Vol. 4. 1863–1866 vol.4. https://doi.org/10.1109/ISCAS. in computing small feedback arc sets in large graphs. Multiple 1992.230449 configurations have been tested with varying degrees of success. [11] Jure Pustoslemšek, Ema Črne, and Nejc Rihter. 2024. Implementation of a smartAE-based minFAS algorithm. https://github.com/jurepustos/fas-smartAE/. Our implementation achieved comparable solution quality in sig- [Online; accessed 26-July-2024]. nificantly less time than the original approach on the same dataset, and it was also able to handle larger graphs. By employing size reduction techniques, we effectively decreased the complexity of 22 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Counter-Strike Character Object Detection via Dataset Generation Matija Šinko matija11.sinko1@gmail.com University of Maribor, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia ABSTRACT Finally, we discuss the potential applications of the dataset in multi-This paper addresses the challenge of developing robust object model systems and conclude with suggestions for future research. detection systems in the context of Valve’s Counter-Strike by introducing a novel, high-quality dataset generated using a complex 2 METHODOLOGY image generator built within the Unity game engine. This generator 2.1 Overview of Dataset Generation mimics the original game’s environment and character interactions, This section outlines our approach to generating a dataset for train-capturing the complexity of in-game scenarios. The dataset pro-ing an object detection model in Valve’s Counter-Strike. Using vides a valuable resource for training models like the YOLOv9 Unity, we closely replicated in-game environments and charac-algorithm, which we employ to develop an object detection system ters to ensure the training data accurately reflects real gameplay that achieves high precision and recall, in turn proving the usability conditions. of our dataset. Our dataset and demonstrated model could be used 2.2 Image Generation Pipeline for object detection in future multi-modal autonomous agents, like the one we propose at the end of the paper. Our image generation pipeline was built for flexibility and iterative improvement, allowing quick updates to enhance model training KEYWORDS and evaluation. This approach was key to creating a diverse and robust dataset with various in-game scenarios (see Figure 1). Counter-Strike, object detection, AI, YOLO, autonomous agents, computer vision, data generation 1 INTRODUCTION Artificial intelligence (AI) has opened new possibilities in gaming, with achievements like AlphaGo and AlphaStar mastering complex environments [3, 13]. These advances have sparked interest in applying AI to automate popular games. For fans of first-person shooters like Valve’s Counter-Strike [12], this raises questions about Figure 1: Object detection with image generation pipeline AI-driven gameplay in fast-paced, strategic environments [4]. The primary problem addressed in this paper is the develop-2.3 Detailed Environment and Character ment of a robust object detection system for Counter-Strike, which is a crucial component for creating autonomous agents capable Simulation of human-level gameplay. Traditional approaches have relied on A critical aspect of our dataset generation process was ensuring that manually labeled datasets, which are time-consuming to create and the simulated environments and character models closely resembled often struggle to keep up with game updates. Additionally, previous those in the actual Counter-Strike game, similar to approaches attempts at developing AI agents for Counter-Strike have typically using synthetic data in Unity and Unreal engines for realistic object employed single, monolithic neural networks, leading to mixed detection [1, 10]. results in terms of performance and adaptability. 2.3.1 Lighting and Rendering. We used Unity’s High Definition In response to these challenges, we propose a novel solution: a Render Pipeline (HDRP) [11] to recreate the complex lighting condi-high-quality, automatically generated dataset created using a com-tions of Counter-Strike. Realistic lighting made the dataset closely plex image generator within the Unity game engine. This dataset is mirror the game, helping the model generalize to real gameplay. designed to train object detection models like YOLOv9 [14], which we use to identify enemy players in Counter-Strike. Furthermore, 2.3.2 Character Positioning and Inverse Kinematics (IK). To create we suggest that this dataset can be a foundation for multi-model realistic and varied character poses, we applied Inverse Kinematics agent architectures, which could offer more robust and adaptable (IK) algorithms[15]. IK enabled dynamic posing of characters in AI systems for gaming. natural scenarios like aiming, shooting, and navigating. The structure of this paper is as follows: First, we provide a detailed overview of the methodology used to generate the dataset 2.4 Object Detection and YOLO Algorithm and train the object detection model. Next, we present the results of Object detection is key to validating our dataset, focused on identi-our experiments, demonstrating the effectiveness of our approach. fying enemy players in the game. We used the YOLOv9 algorithm, DOI: https://doi.org/10.18690/um.feri.6.2024.6 ISBN 978-961-286-914-4 23 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Matija Šinko known for its speed and accuracy, to demonstrate the dataset’s 4 RESULTS effectiveness. Training YOLOv9 on our dataset confirmed its utility. 4.1 Purpose of the Experimental Work Though this study centers on YOLOv9, similar detectors could be used in more complex multi-modal agents in the future. The goal of our experimental work was to train a YOLOv9 object detector with sufficient accuracy and recall. This would be critical 3 THE PROPOSED METHOD for potential future use in multi-model autonomous agents, where accuracy would be needed to distinguish between friendly and en-3.1 Detailed Dataset Generation Process emy characters, as well as dead and alive ones. High recall would be As discussed earlier, Our object detectior was trained on a large vital for quickly and reliably spotting characters and differentiating Unity-generated dataset comprising varied scenarios on a detailed them from the background, which would be essential for agents Dust 2 map with both Terrorist and Counter-Terrorist characters. that rely on object detection models for shooting tasks. 4.2 Comparative Studies and Setups 3.1.1 Environment and Character Creation. (1) Map Recreation: The Dust 2 map was decompiled from Counter-We iteratively refined our image generator, creating datasets to Strike 2 game files and imported into Unity 3D. After cleaning train and evaluate the object detection model, with each iteration up the geometry, we obtained a 1:1 replica of the map. (see enhancing accuracy and recall: Figure 2) (1) Initial Model: A dataset of 10,000 images featuring only alive characters in various poses, serving as the baseline. (2) Addition of Dead Characters: Introduced separate labels for dead characters to improve recall, especially in distinguishing live enemies from the background. (3) First-Person Perspective and UI Overlays: Added hands and UI elements to reduce misclassification of player arms as enemies. (4) Blood Splatter Effects: Introduced blood effects to en-Figure 2: Comparison of Dust 2 in real game (left) and Unity hance precision in differentiating character states. Recreation (right) (Left source: Counter-Strike 2 during au- (5) Name Tag-Based Identification: Added name tags to thors’ gameplay) distinguish friendly from enemy characters, as friendlies always have tags above their heads. Some tags without (2) Character Models: We extracted and rigged character models visible characters belong to friendlies behind walls, which for Terrorists and Counter-Terrorists using Blender, then im-we avoid classifying. This approach required training two ported them into Unity with various poses and weapons. (see separate models, one for each team. Figure 6 in Appendix) These were evaluated for precision and recall improvements. (3) Lighting and Rendering: Unity’s High Definition Render Pipeline (HDRP) was employed to set up realistic lighting con-4.3 Datasets Used for Testing ditions, using path tracing technology to closely mimic the visuals of Counter-Strike 2. (see Figure 3) To ensure the robustness and generalizability of the object detection model, we evaluated it on a variety of datasets: (1) Self: This dataset was generated using the same methods as the training set, serving as a control to measure overfitting and baseline performance. (2) Bots CT and T Combined: Captured from bot games in Counter-Strike, this one did not include unique player skins. This dataset is the closest to a final deployment scenario. (3) Batch 3: This dataset included captures from real multi-Figure 3: Comparison: Real Counter Terrorist (left) and Unity player games, featuring unique player skins and a variety recreation (right) (Left source: Counter-Strike 2 during au-of in-game environments. thors’ gameplay) (4) Batch 4: Similar to Batch 3 but sourced from different mul-tiplayer sessions, providing additional variety in testing 3.1.2 Data Annotation and Generation. conditions. (1) Random Placement: Characters were randomly placed around (5) Batch 1: Captured from the Deathmatch mode, this dataset the map, and virtual cameras were positioned to capture various differs most from the intended deployment environment perspectives. but provides insights into model generalization. (2) Dynamic Posing: Using the Final IK plugin for inverse kine- (6) Bots, Batch 3, Batch 4 Combined: A comprehensive matics [7], characters were given dynamic poses aimed at dif-dataset combining Bots CT and T, Batch 3, and Batch 4, ferent targets, adding variability to the training data. used to test overall model performance across various con- (3) Labeling: Our system automatically annotated images with ditions. bounding boxes for each character, distinguishing between (7) All Combined: A super-set combining all the above datasets. Terrorists, Counter-Terrorists, and their states (alive or dead). 24 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Counter-Strike Character Object Detection via Dataset Generation These datasets tested the model’s strengths and weaknesses. be distinguished from the living, without requiring detailed differentiation among themselves. Tabled and detailed Data for these 4.4 Evaluation Metrics cross model examinations can be found in the Appendix A.3. We measured various evaluation metrics, including Precision, Recall, F1 Score, mAP, and IoU, with detailed results available for each in the Appendix. Our focus was on Precision and Recall. High Precision is vital for accurately identifying enemy characters, avoiding misclassification of friendly units, while high Recall ensures all enemies are detected quickly and distinguished from the backround. These metrics are crucial for creating robust object detection models that could be used for potential future autonomous agents in competitive gaming. 4.5 Announcement of the Experiments Figure 4: Counter-Terrorist class across all of our generation Performed upgrades on 10k 640x360 images We conducted experiments to evaluate object detection models 4.6.3 Testing Different Dataset Sizes. We trained models on varying trained on our dataset, testing their ability to detect and classify dataset sizes ranging from 1,000 to 100,000 images to examine the characters in Counter-Strike under various conditions. These ex-impact on performance. As seen in Figure 15 in the Appendix Larger periments included assessing different character states (alive, dead) training sizes improved model performance significantly. and the impact of dataset sizes and image resolutions, aiming to refine the model’s precision and recall for real-world gameplay. 4.6.4 Bigger Image Sizes and a Bigger Model. We explored the effects of training with larger image sizes and using a bigger YOLOv9 4.6 Detailed Descriptions of Experiments and model and we came to the conclusion that bigger image sizes and a Results bigger model lead to better performance all around See Apendix A.5. That’s why we trained out best model on a 1280x720 image 4.6.1 Image Generator Upgrades. size and with 20k images (1) Initial Model of Only Alive Characters: This model showed good precision but lacked recall, which is crucial 4.6.5 Comparison with Other Counter-Strike Object Detectors. In for our application. this section, we compare our best object detection model, trained (2) Addition of Dead Characters: Improved the model’s re-on a dataset of 20k images at a resolution of 1280x720 with name call, which is crucial for distinguishing between live ene-tags, to three other models developed by different authors: mies and background elements. • Our Best Model: (Terrorist team, 1280x720, 20k images). (3) First-Person Perspective and UI Overlays: addressed • Siromer’s Model: Dataset taken from [8, 9]. the issue of confusing player arms with enemy characters. • Python Lessons Model: Dataset taken from [5, 6]. This Improved recall. dataset was used by Chenyang Dai [2] for object detection (4) Added Blood Splatter Effects: improved both precision in their hybrid imitation training model. and recall for the terrorist dead and alive classes. (5) Friendly Character Identification via name tag: Showed best results and vastly improved accuracy and recall . Batch 1, trained in Deathmatch mode without name tags, performed worse, but this isn’t a concern since the dataset is intended for Competitive mode, where name tags are always present. 4.6.2 Cross Model Examination. The purpose of the cross-model examination is to compare the performance of different model configurations across various metrics. Figure 5: Comparison of object detection models As observed in Figure 4, our upgrades to the Unity Generator were successful in improving performance. For example, the re-Our curated bar charts show that our object detector outperforms call for the counter-terrorist class improved from 0.4 to over 0.6. other methods in detecting Terrorists on the two selected datasets Graphs for the class Terrorist as well as the F1 score can be found in See table 1. For results on additional datasets and classes, see the the Appendix A.3. While the improvements on the dead character Appendix A.6. Notably, Chenyang Dai used the ’Python Lessons’ classes seen in Figure 14 in the Appendix, were not significant, we dataset to train the object detection part of their multi-model AI. observed notable enhancements in the performance for the terrorist This indicates that integrating our dataset into a multi-model Counter-and counter-terrorist classes see 12, 13. This distinction is crucial, as Strike 2 system could potentially yield superior performance, as our our primary objective is for our dataset to be suitable for potential dataset seems better than Python Lessons’. These findings support training of agents that can accurately differentiate between alive our initial hypothesis that our dataset could be used to train more friendly and enemy characters. The dead characters need only to complex multi-model agents. 25 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Matija Šinko Table 1: Detailed Metrics for Class: Terrorist 5.3 Future Work Metric Bots CT and T Combined Bots, Batch 3, Batch 4 Combined Looking forward, there are several avenues for enhancing the ca-Ours Siromer PyLessons Ours Siromer PyLessons pabilities of our system: Precision 0.97 0.82 0.67 0.87 0.81 0.68 Recall 0.91 0.79 0.41 0.60 0.48 0.37 F1 Score 0.94 0.80 0.51 0.71 0.60 0.47 • Expansion of Dataset and Model Generalization: Future work will focus on expanding the dataset to include additional 4.6.6 Video Demonstrations of Model Performance. We provide decompiled maps and the introduction of random cosmetic skin video demonstrations of our object detection model’s performance patterns to improve the model’s generalization. Additionally, for both Counter-Terrorist and Terrorist scenarios. Available on incorporating YOLOv9 pose estimation will allow for the identi-YouTube: fication of specific character body parts, thereby enhancing the model’s ability to aim and shoot with greater effectivenes in a • Counter-Terrorist: www.youtube.com/watch?v=u49CLDt8MgU potential reinforcement learning framework. • Terrorist: www.youtube.com/watch?v=u49CLDt8MgU • Proposed Multi-Model Agent Architecture: We propose a 4.7 Discussion complex multi-model architecture that could serve as the foundation for developing autonomous agents capable of high-level The improvements through our image generation pipeline have suc-gameplay in Counter-Strike see Appendix A.7. cessfully enhanced model performance. The precision achieved by • Planned Dataset Publication: We plan to publish our dataset the model is sufficient to avoid mistakenly identifying teammates on platforms like Kaggle, Hugging Face, and Roboflow, allow-or dead characters as threats, while the recall is robust enough to ing others to use it for developing agents and practicing object reliably detect enemy players. Any remaining inaccuracies could detection skills. be further refined with reinforcement-based shooting models that adapt to detection patterns when used on our generated dataset. 5.4 Final Thoughts In comparison to previous studies, such as [4], which encountered This study underscores the potential of synthetic data and iterative issues like agents mistakenly shooting dead characters, our ap-model development in advancing AI for gaming. While challenges proach offers a clear advantage. The dataset we generated, paired remain, particularly in bridging the gap between synthetic and real-with a specific object detection model, can relieve a potential agent world data, the progress made here provides a solid foundation for from the need to learn the shapes of characters, allowing it to fo-future innovations. The proposed multi-model architecture repre-cus more on tasks like shooting accuracy and map traversal when sents a promising direction for developing more sophisticated and learning. While our model shows promise, it is currently limited to capable autonomous agents, capable of performing at a high level detecting objects within the Dust 2 map and is sensitive to player in complex gaming environments like Counter-Strike. As AI con-cosmetic skins. Additionally, the model does not differentiate be-tinues to evolve, integrating reinforcement learning and advanced tween body parts, which may limit its application in more precise, detection techniques will be crucial in pushing the boundaries of action-oriented tasks. what these agents can achieve. 5 CONCLUSION REFERENCES [1] Per-Arne Andersen, Teodor Aune, and Daniel Hagen. 2022. Development of a 5.1 Summary of Work and Key Findings Novel Object Detection System Based on Synthetic Data Generated from Unreal Game Engine. Applied Sciences 12 (08 2022). This research focused on the development and validation of a high- [2] Chenyang Dai. 2021. Counter-Strike Self-play AI Agent with Object Detection quality dataset generated using a Unity-based image generator for and Imitation Training. CS230: Deep Learning, Fall 2021, Stanford University, CA. (LaTeX template borrowed from NIPS 2017.). training object detection models in the context of Counter-Strike. [3] Guillaume Lample and Devendra Singh Chaplot. 2018. Playing FPS Games with Through iterative enhancements to our image generation pipeline, Deep Reinforcement Learning. arXiv:1609.05521 [cs.AI] we achieved significant improvements in both precision and recall [4] Tim Pearce and Jun Zhu. 2021. Counter-Strike Deathmatch with Large-Scale Behavioural Cloning. arXiv:2104.04258 [cs.AI] of the YOLOv9-based object detection model. This validated the [5] Python Lessons. 2020. TensorFlow 2.3.1 YOLOv4 - CSGO Aimbot. Accessed: effectiveness of our approach, demonstrating that synthetic data 2024-08-30. [6] Python Lessons. 2020. YOLOv4 TensorFlow 2.3.1 - CSGO Aimbot. Accessed: can effectively train models for complex in-game scenarios. 2024-08-30. 5.2 Best Results and Contributions [7] RootMotion. 2014. Final IK: The Ultimate IK Solution for Unity. Accessed: 2024-08-30. Our study made several key contributions to the field: [8] Faruk Günaydin (Siromer). 2024. Counter-Strike 2 Body and Head Classification. Accessed: 2024-08-25. • Versatile Dataset for Object Detection: Our image generator [9] Faruk Günaydin (Siromer). 2024. CS:GO/CS2 Object Detection. Accessed: 2024-08-25. and datasets are valuable for training object detection models in [10] et al. Steve Borkman, Adam Crespi. 2021. Unity Perception: Generate Synthetic Counter Strike. Data for Computer Vision. arXiv:2107.04259 [cs.CV] [11] Unity Technologies. 2024. High Definition Render Pipeline (HDRP). Accessed: • Effective Use of Synthetic Data Proven by Object Detection: 2024-08-30. We showed that synthetic data can replace real-world data in [12] Valve Corporation. 2023. Counter-Strike 2. Accessed: 2024-08-30. training models, especially when labeled data is scarce. This [13] Hado van Hasselt, Arthur Guez, and David Silver. 2015. Deep Reinforcement was proven by achieving strong results with an object detection Learning with Double Q-learning. arXiv:1509.06461 [cs.LG] [14] Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. 2024. YOLOv9: model trained on our generated data. Learning What You Want to Learn Using Programmable Gradient Information. • Future Applications: Our work could be incorporated into arXiv:2402.13616 [cs.CV] [15] Chris Welman. 1993. future autonomous agents or used as object detection teaching Inverse Kinematics and Geometric Constraints for Articulated Figure Manipulation. Simon Fraser University. exercises. 26 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Counter-Strike Character Object Detection via Dataset Generation A APPENDIX Reference to the image context in the document A.1 Detailed Dataset Generation Process Figure 6: Various character poses for: Terrorists and Counter-Terrorists each aiming at a target (source: Our Counter-Strike recreation inside Unity 3D) A.2 Model Upgrades Figures related to the model upgrades and their detailed results can be found here: Figure 10: Blood next to a dead character (source: Our Counter-Strike recreation inside Unity 3D) Figure 7: Alive Counter-Terrorist and Terrorist characters (source: Our Counter-Strike recreation inside Unity 3D) Reference to the image context in the document Reference to the image context in the document Figure 8: Dead and alive characters (source: Our Counter-Strike recreation inside Unity 3D) Reference to the image context in the document Figure 11: Image for the Coutner Terrorist model with a name Figure 9: First-person hands and UI (source: Our Counter-tag over a Counter terrorist and a Terrorist with no name tag Strike recreation inside Unity 3D) (source: Our Counter-Strike recreation inside Unity 3D) 27 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Matija Šinko A.3 Cross Model Examination data Figure 13: Terrorist class across all of our generation upgrades Figure 14: Counter-Terrorist dead class across all of our generation upgrades Table 2: Precision for Class: Terrorist Data Set Self CT and T Combined Batch 1 Batch 3 & 4 Only Alive 0.972 0.901 0.758 0.802 Dead and Alive 0.944 0.880 0.670 0.890 Hands w/ Guns & UI 0.937 0.857 0.752 0.812 Figure 12: Counter-Terrorist class across all of our generation Added Blood 0.945 0.840 0.760 0.879 upgrades Splatter Stains 28 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Counter-Strike Character Object Detection via Dataset Generation Table 3: Recall for Class: Terrorist A.4 Testing Different Dataset Sizes Data Set Self CT and T Combined Batch 1 Batch 3 & 4 Only Alive 0.909 0.479 0.421 0.357 Dead and Alive 0.894 0.736 0.530 0.395 Hands w/ Guns & UI 0.887 0.749 0.483 0.418 Added Blood 0.854 0.775 0.583 0.428 Splatter Stains Table 4: F1-Score for Class: Terrorist Data Set Self CT and T Combined Batch 1 Batch 3 & 4 Only Alive 0.940 0.626 0.541 0.494 Dead and Alive 0.919 0.802 0.592 0.547 Hands w/ Guns & UI 0.911 0.799 0.589 0.552 Added Blood 0.897 0.806 0.660 0.576 Figure 15: all metrics (classes) combined for different dataset Splatter Stains sizes Table 6: Detailed Metrics for Class: Counter Terrorist Dead Data Set Self CT and T Combined Batch 1 Batch 3 & 4 Only Alive - - - - Dead and Alive 0.861 0.650 0.315 0.270 Hands w/ Guns & UI 0.853 0.483 0.552 0.294 Added Blood 0.841 0.553 0.446 0.222 Splatter Stains A.5 Bigger Image Sizes and a Bigger Model Table 5: Combined Metrics for Class: All Metrics Data Set Self CT and T Combined Batch 1 Batch 3 & 4 Blood Dataset 1k 0.919 0.598 0.544 0.448 Blood Dataset 5k 0.920 0.608 0.519 0.416 Blood Dataset 10k 0.916 0.714 0.575 0.536 Blood Dataset 20k 0.919 0.665 0.574 0.464 Blood Dataset 100k 0.905 0.737 0.568 0.460 Figure 16: YOLO model sizes 29 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Matija Šinko YOLO offers five different model sizes 16, each with a trade-off between accuracy and training/inference speed. All of our experiments so far have been conducted using the smallest model, YOLOv9t, due to its faster inference time. The reasoning is that if our object detector is to be used as part of a real-time autonomous agent, we need the fastest possible response time (inference time). Similarly, we have used images with dimensions of 630x360 for all experiments so far, as larger images provide better accuracy but at the cost of slower processing speeds. We have also explored dataset Figure 18: Counter-Terrorist dead class across all of our gen-sizes, which show a similar trend: larger datasets yield better accu-eration upgrades racy but result in longer training times. Such adjustments to our dataset, such as changing model size, image size, or dataset size, are dynamic and easy to make when using an image generator like ours but are very rigid and time-consuming in traditional hand-labeling scenarios. In this section, we will examine how different model sizes, image dimensions, and dataset sizes affect accuracy and compare their performance. Here we compare 4 different models to see how effective different methods are (image size, dataset size, model size). • A 10k YOLOv9 tiny model with image size (640x360) • A 100k YOLOv9 tiny model with image size (640x360) • A 10k YOLOv9 model with image size (640x360) • A 10k YOLOv9 tiny model with image size (1280x720) Figure 17: 1280x720 model next to previous models Table 7: Performance Metrics for Class: Counter Terrorist Figure 19: 1280x720 model compared to previous models Data Set 10k T 100k T 10k C Prec Rec F1 Prec Rec F1 Prec Rec F1 Self 0.937 0.856 0.895 0.924 0.894 0.909 0.937 0.905 0.921 Bots CT and T Combined 0.862 0.632 0.730 0.862 0.645 0.738 0.910 0.595 0.719 Batch 1 0.787 0.592 0.676 0.792 0.503 0.615 0.837 0.556 0.668 Bots, Batch 3, Batch 4 Combined 0.761 0.550 0.638 0.700 0.553 0.618 0.846 0.466 0.601 From the above bar charts, we can deduce that having a bigger dataset, choosing a bigger model, and using larger image sizes helps improve our model’s performance. The best performance was observed in the model trained on larger images. Future work could involve training a model on 100k images of size (1280x720). A.6 Comparison with Other Counter-Strike Check the Appendix to see how the (1280x720) model compares Object Detectors to our previous generation upgrades. A.5. In this section, we present a broader comparison of our model We can see that the higher image size aligns with our trend of against those trained on Siromer’s and PythonLessons’ datasets increasing recall, which was our original goal. across more metrics and datasets. 30 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Counter-Strike Character Object Detection via Dataset Generation on the "Batch 1" dataset. This is not entirely surprising. As previously mentioned, our best-performing model relies heavily on the effectiveness of name tags above characters’ heads to distinguish friendlies from enemies. This approach led to significant improvements but performed worse on datasets that were trained in game modes like "Deathmatch," where no name tags are displayed above friendly characters. Batch 1, as stated earlier, is trained in such a deathmatch game mode. Therefore, more traditional approaches, such as those using Siromer’s and PythonLessons’ datasets, generalize better in this scenario. However, this is not a major concern for us since our model is designed for the competitive game mode, where two teams of five players face off and friendly characters always have name tags above their heads. Figure 20: Comparison of object detection models expanded on the class "Terrorist" Table 8: Expanded Metrics for Class: Terrorist Metric Precision Recall Data Set Self Bots CT & T Batch 1 Bots, B3, B4 Self Bots CT & T Batch 1 Bots, B3, B4 Ours (1280x720) 0.97 0.97 0.75 0.87 0.93 0.91 0.52 0.60 Siromer 0.90 0.82 0.76 0.81 0.87 0.79 0.61 0.48 PyLessons 0.77 0.67 0.42 0.68 0.90 0.41 0.30 0.37 Metric F1 Score mAP50 Data Set Self Bots CT & T Batch 1 Bots, B3, B4 Self Bots CT & T Batch 1 Bots, B3, B4 Ours (1280x720) 0.95 0.94 0.61 0.71 0.93 0.91 0.52 0.60 Siromer 0.88 0.80 0.67 0.60 0.87 0.79 0.61 0.48 PyLessons 0.83 0.51 0.35 0.47 0.90 0.41 0.30 0.37 Metric mAP50_95 Fitness Data Set Self Bots CT & T Batch 1 Bots, B3, B4 Self Bots CT & T Batch 1 Bots, B3, B4 Ours (1280x720) 0.97 0.92 0.57 0.73 0.82 0.51 0.28 0.33 Siromer 0.86 0.87 0.67 0.59 0.61 0.58 0.41 0.37 PyLessons 0.88 0.53 0.26 0.45 0.64 0.31 0.14 0.25 As shown in Figure 20 and seen in table 8, our model outperforms Figure 21: Comparison of object detection models expanded the other two models across all metrics, except when evaluated on the class "Counter-Terrorist" 31 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Matija Šinko Table 9: Expanded Metrics for Class: Counter-Terrorist A.7 Future Work and Proposed Multi-Model Architecture Metric Precision Recall Proposed Multi-Model Agent Architecture: We propose a com-Data Set Self Bots CT & T Batch 1 Bots, B3, B4 Self Bots CT & T Batch 1 Bots, B3, B4prehensive multi-model architecture that could serve as the foun-Ours (1280x720) 0.97 0.83 0.73 0.55 0.91 0.76 0.60 0.44 Siromer 0.91 0.81 0.78 0.68 0.95 0.56 0.62 0.61 dation for developing autonomous agents capable of high-level PyLessons 0.81 0.54 0.40 0.39 0.75 0.32 0.33 0.44 Metric F1 Score mAP50 gameplay in Counter-Strike: Data Set Self Bots CT & T Batch 1 Bots, B3, B4 Self Bots CT & T Batch 1 Bots, B3, B4 (1) Object Detection for Enemy Players: This model func-Ours (1280x720) 0.94 0.80 0.66 0.49 0.91 0.76 0.60 0.44 tions as the agent’s eyes, focusing on detecting and classify-Siromer 0.93 0.66 0.69 0.64 0.95 0.56 0.62 0.61 PyLessons 0.78 0.40 0.36 0.41 0.75 0.32 0.33 0.44 ing enemy players, enabling the agent to focus on shooting Metric mAP50_95 Fitness accuracy and map traversal. Data Set Self Bots CT & T Batch 1 Bots, B3, B4 Self Bots CT & T Batch 1 Bots, B3, B4 (2) Reinforcement Learning for Aiming and Shooting: Ours (1280x720) 0.95 0.76 0.60 0.41 0.78 0.29 0.32 0.21 Acting as the agent’s arms, this model uses reinforcement Siromer 0.97 0.62 0.69 0.63 0.74 0.45 0.46 0.39 PyLessons 0.83 0.31 0.24 0.33 0.53 0.19 0.12 0.18 learning to aim and shoot at detected enemies, adjusting its behavior based on skill levels. As seen in Figure 21 and table 9, our model performs noticeably (3) Object Detection on the In-Game Minimap: This com-worse on the "Counter-Terrorist" class compared to the "Terrorist" plementary model identifies player positions on the in-class, as shown in Figure 20. In some instances, it is even outper-game minimap, providing additional spatial awareness. formed on the "Bots, Batch 3, Batch 4 Combined" dataset. This (4) Decision Making for Player Movement: Utilizing min-outcome is also expected because our latest model, which leverages imap data, this model determines the agent’s movement name tags, requires training two separate models: one for when the strategy, optimizing its position on the map through super-friendly team is "Terrorists" and another for when the friendly team vised or reinforcement learning. is "Counter-Terrorists." This approach results in some asymmetry (5) 3D Environment Modeling and Detection: Enhancing in the detection performance between the two classes. The data environmental perception, this model employs techniques shown in the previous two figures was obtained using the model like SLAM or MiDaS to build a 3D understanding of the trained when the friendly team was "Terrorists." Results for the game world. model trained when the friendly team was "Counter-Terrorists" (6) Dynamic Map Traversal: Leveraging the 3D environment showed complementary outcomes, with the "Counter-Terrorist" model, this model navigates the map dynamically, utiliz-team being detected more effectively. The differences, however, are ing pathfinding algorithms or reinforcement learning to minor. We chose to present the results from the "Terrorist" team simulate player inputs. model because it yielded slightly better results. In future work, it may be more beneficial to average the results across the two team-based models for each class. Lastly, it is important to emphasize that the most relevant dataset for comparison is the "Bots CT and T Combined," as it most accurately represents competitive environments simulating real matches with bot opponents and teammates. The other datasets introduce more variability in the form of custom player cosmetics, which our generator does not currently support. However, this is unnecessary if our goal is to use our object detector exclusively for bot matches. 32 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Cross-Lingual False Friend Classification via LLM-based Vector Embedding Analysis Mitko Nikov Žan Tomaž Šprajc Žan Bedrač mitko.nikov@student.um.si zan.sprajc@student.um.si zan.bedrac@student.um.si University of Maribor, University of Maribor, University of Maribor, Faculty of Electrical Faculty of Electrical Faculty of Electrical Engineering and Computer Science, Engineering and Computer Science, Engineering and Computer Science, Maribor, Slovenia Maribor, Slovenia Maribor, Slovenia ABSTRACT other South Slavic languages, Croatian and Serbian [4], in our case, In this paper, we propose a novel approach to exploring cross-we expect the declension differences between Slovenian and Mace-linguistic connections, with a focus on false friends, using Large donian to be too significant for such a preprocessing step to be Language Model embeddings and graph databases. We achieve a used. classification performance on the Spanish-Portuguese false friend A recently introduced method for automatic false friends detec-dataset of F1 = 83.81% using BERT and a multi-layer perceptron tion in related languages [6] uses a linear transformation between neural network. Furthermore, using advanced translation models the two vector spaces in both languages to isolate false friends. to match words between vocabularies, we also construct a ground The linear transformation acts as a translation between the two truth false friends dataset between Slovenian and Macedonian - two languages. They [6] expect that one vector in one language should languages with significant historical and cultural ties. Subsequently, be close to its cognate partner [5] in the other language after the we construct a graph-based representation using a Neo4j database, linear transformation, however, for false friends, this should not wherein nodes correspond to words, and various types of edges be the case. They use the Spanish and Portuguese Wikipedia as a capture semantic relationships between them. corpus for the unsupervised learning of the Word2Vec models [15]. Since the linear transformation is a bijection, each vector in one KEYWORDS of the languages is uniquely mapped to a vector in the other language. It is impossible for such a model to account for the different false friends, large language models, BERT, linguistics, natural lan-meanings one word can have. To solve this issue, we propose an guage processing improvement to this method by extending the vector space to use LLM embeddings [18] of meanings instead of single words. 1 INTRODUCTION Regarding the false friend classification between Macedonian When observing individual languages, we come across homonyms, and Slovenian, we needed to take a different approach to ground which are words that have the same spelling or pronunciation but truth dataset creation. Our approach is based on finding words varied meanings, such as the word “bat”, which pertains to either the with the same spelling in Slovenian and Macedonian, translating animal or the sports requisite. As we move from the confines of one them to English using a pre-trained bidirectional translator API language and observe two, we encounter chance false friends [10]. and matching false friends accordingly. This approach also yields These have the same spelling but varied etymologies and meanings an unexpected amount of true friends, which are also useful to us. in different languages, such as the English word “in”, which in A prime example of a false friend would be the word "obraz", which Slovenian means “and”. So, we decided to pivot our observation in Slovenian means "face" and in Macedonian means "cheek". On further and focus solely on words that have the same etymological the other hand, a true friend would be the word "jagoda", which origin and spelling whilst having different meanings in different means "strawberry" in both Slovenian and Macedonian. These and languages, so-called semantic false friends [10, 12, 16]. a few other examples are given in Table 1. A similar endeavour was undertaken by Ljubešić & Fišer [13], In the following sections, we will describe the methodology that which attempted to identify true equivalents, partial false friends, we used to classify false friends, as well as the methodology used and false friends in Slovenian and Croatian based on their spelling to create a ground truth dataset for Slovenian and Macedonian and semantic meaning. Our analysis will also touch on true equiv-false friends. Our overview of the classification process will be alents (word pairs with the same meaning and usage [13]), par-based on BERT as a word to vector model, with special attention tial false friends (pairs that alternate between polysemy and false given to the embedding extraction process. Moreover, we will dive friends [13]), and pure false friends. into our methodology for ground truth creation, with a special An initial step to finding false friends could be lemmatization-emphasis on the issues that result from the translation of words that based tagging [4], which is able to differentiate between parts of have multiple meanings. We will also describe the graph database speech, reducing words to their root form. Which in practice means representation of the false friends dataset. Finally, we will present that a verb like “working” is reduced to its root of “work”. Stem-our results, comparing our methodology to an existing one. We ming is another alternative, which has already been applied to will evaluate our results in terms of precision, recall, F1 scores, and Czech together with a language-independent approach (n-gram) provide a summary of our false friends ground truth dataset. [9]. However, even though lemmatization proved effective for two DOI: https://doi.org/10.18690/um.feri.6.2024.7 ISBN 978-961-286-914-4 33 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Mitko Nikov, Žan Tomaž Šprajc, and Žan Bedrač Table 1: Examples of true and false friends in the Slovenian and Macedonian language. Slovenian word Macedonian word Slovenian meaning Macedonian meaning Type of word match obraz образ (obraz) face cheek false friend lice лице (lice) cheek face false friend deka дека (deka) blanket that false friend čas час (čas) time time/hour partial false friend jagoda jагода (jagoda) strawberry strawberry true friend kraj краj (kraj) edge/end/region edge/end/region true friend 2 METHODOLOGY When we input a sentence into BERT, it tokenizes the sentence into Recently, Large Language Models (LLMs) and advanced tokenizers subword units (tokens), processes these units through its multiple have revolutionized our understanding of language technologies layers, and produces embeddings for each token at each layer. These and made significant advancements in the field. Their ability to embeddings are rich in context and can capture the nuances of word create incredibly complex and rich context-based vector spaces meanings in different sentences. opens a new area of analysis. Now, we are no longer limited by Word2Vec models but can analyze the vast variety of contextual 2.2 Embedding Extraction Process meanings of individual words. To utilize BERT embeddings, we follow a systematic approach to Thus, our first improvement of the method presented by Castro extract and aggregate these embeddings: et al. [6] comes with the introduction of LLM embeddings instead of (1) Word2Vec models. We use the pre-trained BERT Multilingual LLM Tokenization: Using BERT’s tokenizer, we split each word into its constituent subword tokens. This step ensures that [8] to extract the embeddings of tokens in our training datasets as even unfamiliar words or misspellings can be processed shown in Figure 1. effectively by BERT. (2) Contextual Embedding Extraction: We pass these to-Corpus - Language 1 Corpus - Language 2 kens through the pre-trained BERT model to obtain em-Parsing Parsing beddings. Since BERT embeddings are context-dependent, Clean sentences Clean sentences the same word can have different embeddings based on its surrounding words. Vocabulary Extraction Vocabulary Extraction (3) Averaging Token Embeddings: For words split into mul-Our Method tiple tokens, we compute the final word embedding by Fine-tune BERT LLM Fine-tune BERT LLM Cognates averaging the embeddings of all its constituent tokens. This Ground Truth Vector Embeddings Vector Embeddings aggregated embedding represents the word in its specific Extraction context within the sentence. Bi-directional Translator API 2.3 Classification of False-Friends Multi-layer Perceptron Instead of creating a linear transformation between vector spaces, Neo4j Neural Network we use the embeddings of pairs of words (correlated in both of the languages) and a training dataset of already classified pairs such as Ground Truth Our coverage the Spanish-Portuguese dataset [7] to train a multi-layer perceptron Comparison neural network to classify pairs of unseen words as false or true friends. Figure 1: Our methodology The resulting embedding vector for each word is computed as the average of BERT’s internal embeddings for each token comprising the word. Thus, leveraging the fixed internal embedding dimensions 2.1 BERT as a Word to Vector Model of BERT, where each token is represented by a vector 𝑣 ∈ R768 space. BERT (Bidirectional Encoder Representations from Transformers) We found that a simple dense neural network is enough for our [8] is a transformer-based model that has set new benchmarks in methodology. This neural network has two hidden layers of 2000 a variety of natural language and cross-language processing tasks neurons each, enough to get satisfiable results. [17]. Unlike previous models that processed text in a unidirectional way, BERT reads text bidirectionally, understanding the context 2.4 Creation of Ground Truth Dataset of a word based on both its left and right surroundings. This bidi-To extend the evaluation of our method, we needed to create a new rectional approach allows BERT to generate highly contextualized ground truth dataset, which would consist of a collection of true word embeddings. and false friends. The prerequisite for obtaining said collection was The BERT transformer consists of multiple layers, where each processing a Slovenian [1] and a Macedonian corpus [3]. The former layer is capable of capturing different aspects of the word’s context. was obtained from The Slovenian Academy of Sciences and Arts, 34 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Cross-Lingual False Friend Classification via LLM-based Vector Embedding Analysis while the latter was obtained from the University of Leipzig. The Slovenian corpus was an official list of unique Slovenian words of 354205 different headwords, while the Macedonian corpus consisted of 350921 words obtained from Wikipedia. The latter was processed using our unique word extractor, which resulted in a unique word count of 248083. These two lists of unique words were then further processed in order to extract homographs, which are words with the same spelling, in this case pertaining to Slovenian and Macedonian words. An initial hurdle was the difference in alphabets between the two languages. Slovenian uses the Latin alphabet, while Macedonian uses the Cyrillic alphabet. To overcome this, we transliterated the (a) (b) Macedonian corpus into the Latin alphabet. This allowed us to compare the two lists of unique words. We based our homograph Figure 2: (a) The Slovenian word "drugi" could be translated as extraction on a Levenshtein distance of 0, which meant that we "others", which would match it with the Macedonian word "drugi" only extracted homographs that were identical in spelling. Our (други), or translated as "second", which results in a false friend. extraction of homographs thus produced 21674 homographs and "Drugi" is, therefore, only 50% a false friend. (b) The Macedonian 226409 non-homographs. This stage of our corpus processing thus word "pod" (под) could, likewise, be alternatively translated as floor, left us with 21674 candidates for false and true friends. which means that it could potentially be a false friend. Our next step was to translate each Slovenian and Macedonian homograph to English and compare their English meanings. Those Table 3: Comparison of our and Castro et al. methodology. homographs that produced the same meaning were categorized as Note that we are not using any additional sentences for fine-true friends, while the rest were categorized as false friends. This tuning the Multilingual BERT Model. stage of our research made apparent a flaw in our translation API. The flaw being Google’s translation API [2], which only returns F1 Castro et al. Ours one translation. Moreover, the limited scope of Slovenian and Mace-Sentences 30M 200K 0 donian, compounded by interjections, meant that some translations False 0.7721 0.7324 were inaccurate. Said inaccuracies then resulted in false positives, 0.8505 True 0.7427 0.5783 which were apparent in our Neo4j database. 0.8258 The Neo4j database that we filled with Macedonian words, Slove-Average 0.7574 0.6554 0.8381 nian words, true friends, and false friends was the backbone of our visualization. The latter helped us identify potential problems with Table 4: Classification performance of our approach on our our approach, such as improper false friend connections, example Macedonian and Slovenian false and true friends datasets given in Figure 2a, and true friend connections due to limited re-without fine-tuning of the BERT model. sponses from the Google Translate API, example given in Figure 2b. Precision Recall F1-Score Our analysis of the Neo4j database thus yielded a lot of food False 0.8868 0.8393 0.8624 for thought. An appealing approach was the classification of false True 0.7097 0.7857 0.7458 friends into segregation (pairs carrying absolutely different mean-Average 0.7982 0.8125 0.8041 ings), lexical pairs (both similar and dissimilar meanings), and inclusion (one dissimilar meaning on top of all other similar meanings) as outlined in [11]. But we decided to stick with the binary classifi-parsing them, and training the two Word2Vec models [14] in Span-cation of false and true friends. ish and Portuguese. Each model with a vector dimension of 100 took an hour and a half to train on 30 million sentences, after 3 RESULTS which we could finally derive the linear transformation necessary To compare and benchmark our approach, we recreated the results for translation. Their paper evaluates the method by classifying a of the method used by Castro et al. [6]. Their method included pair of words as true or false friends given a ground truth dataset acquiring the then-newest Wikipedia dumps (dated 20.03.2024), between Spanish and Portuguese [7]. Our re-testing of their method achieved the results shown in Table 2. Table 2: Classification performance using the Castro et al. Our proposed method, however, showed a significant improve- [6] approach on the Spanish and Portuguese dataset. ment on the Spanish-Portuguese dataset with an F1-score of 0.8381, shown in Table 3, without any fine-tuning on the pre-trained BERT Multilingual model. Moreover, because of the pretrained BERT Precision Recall F1-Score model, our method including the extraction of embeddings and False 0.7727 0.7730 0.7721 the training of the neural network to learn the classification of the True 0.7446 0.7424 0.7427 words took under 10 minutes as opposed to the training time of the Average 0.7586 0.7577 0.7574 Word2Vec models of around 3 hours. 35 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Mitko Nikov, Žan Tomaž Šprajc, and Žan Bedrač Our method of extracting homographs from our Slovenian and REFERENCES Macedonian corpus yielded 21674 candidates for false/true friends. [1] 2013. ISJ SAZU - List of Slovenian words. http://bos.zrc-sazu.si/sbsj_en.html Further analysis using comparisons of associated English mean- [Online; accessed 2. Jun. 2024]. ings resulted in 14654 true and 7020 false friends. However, some [2] 2024. Cloud Translation API | Google Cloud. https://cloud.google.com/translate/ docs/reference/rest [Online; accessed 2. Jun. 2024]. of these were false positives due to the multi-meaning nature of [3] 2024. Wortschatz Leipzig Macedonian Corpora. https://wortschatz.uni-leipzig. various words. A manual review of the 7020 false friends gave us de/en/download/Macedonian [Online; accessed 2. Jun. 2024]. [4] Željko Agić, Nikola Ljubešić, and Danijela Merkler. 2013. Lemmatization and 151 ideal false friends that are largely free of true friend overlap. Morphosyntactic Tagging of Croatian and Serbian. ACL Anthology (Aug. 2013), We used these 151 ideal false friends as our Slovenian-Macedonian 48–57. https://aclanthology.org/W13-2408 false friends dataset. The false friends manual review was followed [5] E. Susanne Carroll. 1992. On cognates. Sage Journals 8 (jun 1992). Issue 2. https://journals.sagepub.com/doi/abs/10.1177/026765839200800201 by an extraction of 268 true friends from our initial set of 14654 [6] Santiago Castro, Jairo Bonanata, and Aiala Rosá. 2018. A High Coverage Method true friends. These 268 true friends then comprised our Slovenian-for Automatic False Friends Detection for Spanish and Portuguese. In Proceedings Macedonian true friends dataset. of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Shervin Using our Slovenian-Macedonian dataset of false and true friends1, Malmasi, and Ahmed Ali (Eds.). Association for Computational Linguistics, Santa we achieved similar classification capabilities as with the Spanish-Fe, New Mexico, USA, 29–36. https://aclanthology.org/W18-3903 [7] María de Lourdes Otero Brabo Cruz. 2004. Diccionario de falsos amigos (espanol-Portuguese dataset. Our results can be seen in Table 4. All experi-portugues / portugues-espanol). https://ec.europa.eu/translation/portuguese/ ments were run on an Intel Core i7-9700K @ 3.60GHz and GeForce magazine/documents/folha47_lista_pt.pdf RTX 2070 SUPER GPU. [8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ACL Anthology (June 2019), 4171–4186. https://doi.org/10.18653/v1/N19-1423 4 CONCLUSIONS [9] Ljiljana Dolamic and Jacques Savoy. 2009. Indexing and stemming approaches for the Czech language. Information Processing & Management 45, 6 (Nov. 2009), In this paper, we presented a novel approach to exploring cross-714–720. https://doi.org/10.1016/j.ipm.2009.06.001 [10] Pedro Domínguez and Brigitte Nerlich. 2002. False friends: Their origin and linguistic connections, specifically focusing on false friends, us-semantics in some selected languages. Journal of Pragmatics 34 (Dec. 2002). ing Large Language Model embeddings and graph databases. Our https://doi.org/10.1016/S0378-2166(02)00024-3 [11] Ketevan Gochitashvili and Giuli Shabashvili. 2018. The issue if “false friends” in methodology leverages the advanced capabilities of BERT for gener-terms of learning a foreign language(Using the example of Georgian and English ating contextualized word embeddings and a graph-based represen-languages). International Journal Of Multilingual Education VI (July 2018), 33–41. tation to capture semantic relationships. We achieved classification https://doi.org/10.22333/ijme.2018.11006 [12] Diana Inkpen and Oana Frunza. 2005. Automatic Identification of Cog-performance on the Spanish-Portuguese false friend dataset with nates and False Friends in French and English. ResearchGate (Jan. an F1 = 83.81% and classification performance on our Slovenian-2005). https://www.researchgate.net/publication/237129220_Automatic_ Macedonian dataset of F1 = 80.41% using Multilingual BERT and a Identification_of_Cognates_and_False_Friends_in_French_and_English [13] Nikola Ljubešić and Darja Fišer. 2013. Identifying false friends between closely multi-layer perceptron neural network. BERT was not fine-tuned related languages. ACL Anthology (Aug. 2013), 69–77. https://aclanthology.org/ using any additional sentences. W13-2411 [14] Long Ma and Zhang Yanqing. 2015. Using Word2Vec to Process Big Text Our results indicate that LLM embeddings significantly enhance Data. (Oct. 2015). https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber= the accuracy of false friend classification compared to traditional 7364114 Word2Vec models. The use of a pretrained LLM also significantly [15] Tomas Mikolov, G.s Corrado, Kai Chen, and Jeffrey Dean. 2013. Ef- ficient Estimation of Word Representations in Vector Space. 1–12. reduced the time it takes to learn the classifications from 3 hours https://www.researchgate.net/publication/319770439_Efficient_Estimation_of_ needed to train the Word2Vec models to under 10 minutes solely for Word_Representations_in_Vector_Space the training of the multi-layer perceptron classifier. This highlights [16] Ruslan Mitkov, Viktor Pekar, Dimitar Blagoev, and Andrea Mulloni. 2007. Methods for extracting and classifying pairs of cognates and false friends. Machine the potential of using sophisticated language models for even more Translation 21, 1 (March 2007), 29–53. https://doi.org/10.1007/s10590-008-9034-5 complex linguistic tasks, paving the way for more accurate and [17] Telmo Pires, Eva Schlinger, and Dan Garrette. 2019. How Multilingual is Multilingual BERT?. In insightful cross-linguistic analysis. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, A natural next step to enhancing our methodology would be in-Italy, 4996–5001. https://doi.org/10.18653/v1/P19-1493 corporating larger and more diverse corpora. These would fine-tune [18] Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei. 2023. Improving Text Embeddings with Large Language Models. arXiv the pre-trained BERT model on specific language pairs or domains, (Dec. 2023). https://doi.org/10.48550/arXiv.2401.00368 arXiv:2401.00368 improving the contextual accuracy of embeddings. Moreover, larger corpora would yield additional false and true friends in our ground truth dataset. More advanced translation APIs would be capable of providing multiple translations for each word, which would result in fewer false positives when creating such a dataset. Furthermore, extending the methodology to other cross-linguistic phenomena, such as idiomatic expressions, cognates, and loan-words, would improve our understanding of language relationships. False and true friends are, therefore, the tip of a linguistic iceberg that calls for further exploration. 1The datasets are available at https://github.com/mitkonikov/false-friends 36 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Analyzing tourist destinations in Belgrade using geotagged photos from Flickr Vera Milosavljević Dejan Paliska vera.milosavljevic@famnit.upr.si dejan.paliska@fts.upr.si University of Primorska, University of Primorska, Faculty of Mathematics, Faculty of Tourism, Natural Sciences and Information Technologies, Department for Sustainable Department of Mathematics, Destination Development, Koper, Slovenia Koper, Slovenia ABSTRACT (3) Generating trajectories of movement by using geo-coordinates This research aims to analyze tourist destinations in Belgrade by for each user. defining trajectories of movement of the users of platform Flickr (4) Trajectory analysis and visualisation for better understand-using geotagged photos on Flickr. We defined tourist movements ing of the movements. and used generalization techniques to identify the main tourists (5) Clustering, aggregation and generalisation of trajectories. locations. We applied several techniques to identify frequently (6) Prediction of the next tourist movement based on several visited locations and predict next possible tourist spots. Our findings different modeling tecniques. provide insights into popular travel patterns and suggest potential areas for tourism development. 3 SIGNIFICANCE Understanding the location preferences of visitors is crucial for KEYWORDS local tourism organizations, travel agencies, and other stakehold-Tourism data analysis, active user bias, geotagged photo, DBSCAN ers involved in destination development. Information about local clustering, trajectory generalization attractions, visitor mobility, and intradestiation movement patterns can help with strategic planning, improve destination marketing, and enhance connectivity between at- tractions. By using modern 1 INTRODUCTION technology and user-generated content, such as geotagged photos Tourism is a major contributor to the global economy, offering from social media, re- searchers can better understand visitor pat-economic benefits and fostering cultural exchange. With the growth terns and behaviors. In Belgrade, the relationships between tourist of social media and mobile phones usage, tourists now document destinations and visitor mobility are under-researched, thus this their journeys through geotagged photos, sharing their experiences study aims to address these gaps by analyzing data from Flickr with a wide audience. Flickr, a popular photo-sharing platform, platform. This will allow us to identify tourists POIs (clusters), and contains a wide repository of such geotagged images and publicly their mobility patterns at destinations. Additionally, while Flickr aveilable API which makes it a good choice for our analysis. The data could also be utilized to analyze tourists’ emotions and des-users of the application are not rarely professional photographers tination image, these aspects are beyond the scope of this study. and their focus lies in architecture, nature and country’s most This research contributes to the academic field by demonstrating beautiful destinations. the application of data mining techniques in tourism analytics. Analyzing these photos provides valuable insights into tourists’ behavior at destinations, and particularly into their space-time 4 STRUCTURE patterns. Traditional tourism data often relies on surveys and official The rest of this paper is structured as follows: Section V details the records, which can be bias and limited in scope. In contrast, social methodology, including data collection, preprocessing, clustering, media data offers real-time, user-generated information that reflects aggregation and generalisation, and the prediction techniques used. actual tourist activities and interests. Flickr has many active users Section VI presents the results of our analysis, highlighting key who contribute to the platform daily. findings and patterns. Section VII concludes the paper and suggests In this paper, we focused on Belgrade as a popular tourist desti-directions for future research. nation, but the insights this paper provides can be applicable to any location that has a rich dataset of photos on the platform Flickr. 5 METHODOLOGY 5.1 Tools and Technologies 2 OBJECTIVES All data processing and analysis were done using Python program-This research has the following objectives: ming language, using some of the many data science libraries. The (1) Collecting geotagged photos from Flickr by using the pub-following tools were used: licly available Flick API. • Jupyter Notebook: Computing environment that was used (2) Data preprocessing. Defining the set of variables that are for the development and documentation of the code along important for further analysis. with miniconda installer and command line. DOI: https://doi.org/10.18690/um.feri.6.2024.8 ISBN 978-961-286-914-4 37 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Vera Milosavljević and Dejan Paliska • Pandas: Used for data manipulation and analysis. Data: database with columns ’owner_ID’ and ’distance’ • Numpy: For numerical computations. Result: database with ’loc_ID’ column assigned to every • Scikit-learn: Applied for clustering and other machine learn-row defining the location group ing tasks. /* Define thresholds */ • Geopandas: Used for geographic data processing and visu-1 dist_threshold = 0.0005 cumulative_dist_threshold = 0.001 alization. /* Set initial loc_ID for each row */ • MovingPandas: Used for spatial analysis and manipulation of geospatial data. 2 final_database["loc_ID"] = 1 /* Group data by owner_ID and list distances */ • Mlxtend: Employed for the implementation of the Apriori algorithm and association rule mining. 3 owner_ID_map = group database by owner_ID and • SeqMining: Applied for sequence mining to find frequent aggregate distances; sequences. /* Initialize row index to zero */ • Matplotlib: Used for data visualization. 4 i = 0; 5 foreach key in owner_ID_map do 5.2 Data Collection 6 prev = 1 /* Initial loc_ID for current owner_ID We collected 31019 geotagged photos from 1233 different users from */ the Flickr application. We used the Flickr public API which was 7 cumulative_dist = 0 /* Cumulative distance for granted by a unique key after becoming a user of the Flickr app. current loc_ID */ The dataset was focused on the territory of Belgrade, by using the 8 foreach list_value in owner_ID_map[key] do tag "belgrade". Each photo’s metadata, including the geocoordinates 9 cumulative_dist = cumulative_dist + list_value ; and timestamps, was extracted and used for further analysis. /* Accumulate distance */ 10 if list_value > dist_threshold or cumulative_dist > 5.3 Data Preprocessing cumulative_dist_threshold then The preprocessing step involved cleaning and structuring the col-11 prev = prev + 1 lected data to make it suitable for analysis. We defined a variable /* Increment loc_ID */ ownerID, which represents one user in one day. We defined a vari-12 cumulative_dist = 0 able locID which represents different locations for each user. Algo- /* Reset cumulative distance */ rithm for defining locID is shown as Algorithm 1. 13 end The motivation for defining this variable is because we wanted 14 database.loc[i, "loc_ID"] = prev to tackle the problem of an active user: a situation where one user /* Update loc_ID for current row */ generates many photos of the same location. We wanted to treat 15 i = i + 1 these photos as a single group, with the same value of locID variable. /* Move to next row */ This would ensure that our analysis would be less bias. 16 end The reduced dataset had a structure as shown in the Fig. 1 . We dropped rows which were out of boundaries of Belgrade. The 17 end rows included must have longitude variable value between 20.35 Algorithm 1: loc_ID Algorithm and 20.65 and latitude variable value between 44.7 and 45.1 to be considered as a photo captured on the territory of Belgrade. The final reduced dataset had 20723 rows from 550 users. 5.4 Clustering Various clustering techniques were explored to identify clusters of frequently visited locations. We experimented with DBSCAN, HDBSCAN, and OPTICS algorithms. The centroids of the clusters were identified using mean and median methods. We created visualizations to show the clustering results, trajectories, and connections between clusters. 5.5 Generalization and aggregation Generalization techniques were applied using the Geopandas library and Douglas-Peucker Generalizer [5]. Different threshold parameters for tolerance were tested to observe the effects on generalization. A comparison was made between Douglas-Peucker Generalizer and Top-Down Time Ratio Generalizer. We defined trajectory collection using the MovingPandas [3] library which Figure 1: Reduced dataset is a collection of all trajectories for each OwnerID. We used this collection as an imput parameter for DouglasPeuckerGeneralizer algorithm also defined in MovingPandas library. Fig. 2 shows the 38 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Analyzing tourist destinations in Belgrade using geotagged photos from Flickr trajectories for each OwnerID. We used TrajectoryCollectionAggregator [4] from MovingPandas library to get significant points and flows between them. The results will be show in the section Results. Figure 3: Clustering Figure 2: Trajectories for each OwnerID 5.6 Predictive Modeling In the modeling prediction, our approach was to predict the next cluster in our spatial data analysis based on the set of visited clusters. The clustering technique that we choose was HDBSCAN because we wanted to focus on smaller clusters of tourist destinations within the city. Initially, we defined individual paths for each owner based on their clusters. We excluded outliers marked as -1 (noise) and grouped the remaining data by owner ID. Additionally, we defined a DataFrame with each owner’s ID and their corresponding cluster paths. After computing the unique sequence of clusters for each Figure 4: Connections between clusters owner, we generated trigrams [2] to identify recurring patterns within the paths. We defined the transition matrix and we normal-threshold of 0.01 as the most suitable for our research. We noticed ized transition counts to derive probabilities of transitioning from a small difference between original and generalized trajectories of one cluster to another. Also, we developed functions for predicting movements because our original trajectories were already filtered the next cluster using Markov chains, Monte Carlo simulations, and reduced. We also experimented with TopDownTimeRatio Gen-and association rules derived from Apriori analysis. eralizer. The comparison for a singular trajectory when using DP 6 RESULTS Generalizer and TDTR Generalizer can be seen in Fig. 6. 6.1 Clustering Here we will present the visualisation of the results of the clustering using HDBSCAN algorithm and median method for calculating the centroid of each cluster. Each different color in Fig. 3 represents different cluster. We calculated the number of transitions between each cluster (If one person visited cluster 1 and than after that they visited cluster 3, we would count that as 1 transition between clusters 1 and 3). Fig. 4 shows the connections between clusters. Purple points represent the centroids of the clusters, thickness of the line and the number represent how many connections are Figure 5: Different thresholds for a singular trajectory between two clusters.. 6.2 Generalization 6.3 Aggregation The Fig. 5 shows different thresholds for generalization by Dou-The input parameter for TrajectoryCollectionAggregator was gener-glasPeckuer. By analyzing these input parameters, we opted for alized trajectories with DouglasPeckuer Generalizer as instructed in 39 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Vera Milosavljević and Dejan Paliska Figure 6: Comparison between TDTR and DP for a singular Figure 8: First two rows of Apriori algorithm output trajectory 7 CONCLUSION the documentation of the MovingPandas library. Siginficant points This research introduces some improvements in analyzing tourist and the aggregated flows between them by using this method are activities at destinations by using geotagged photos from Flickr, shown in the Fig. 7. We can observe that this method of reduction focusing on tourist movement patterns in Belgrade. A key innova-gave us similar results as clustering, identifying the centre of the tion of this study is the development and application of the locID city with park Kalemegdan as the most popular tourist area. algorithm, which identifies unique tourist POIs and deals with the issue of overrepresentation from highly active users. This approach ensures that the analysis is more representative and less biased compared to other methods. The clustering results provided a view of frequently visited locations, offering valuable insights into the spatial distribution of tourist activity. By employing generalization techniques, the study effectively simplified tourist trajectories, making the data easier to interpret. Predictive modeling techniques, such as Markov chains, Monte Carlo simulations, and the Apriori al- gorithm, were used to predict future tourist movements. These methods demonstrated their potential for practical applications in tourism management, enabling more targeted marketing strategies and improved visitor experience planning. ACKNOWLEDGMENT The author would like to thank mentor professor Dejan Paliska for the support, cooperation and availability during the whole process of the research. Also, the University of Primorska and representative for this seminar prosessor Miklos for supporting this project. REFERENCES Figure 7: Aggregated flows and more popular areas [1] P. Fournier-Viger, "SPMF: An Open-Source Data Mining Library," [Online]. Available: https://www.philippe-fournier-viger.com/spmf/. [Accessed: 03-Jun-2024]. [2] J. Silge and D. Robinson, "Text Mining with R: A Tidy Approach," O’Reilly Media, 2017. [Online]. Available: https://www.tidytextmining.com/ngrams. [Accessed: 03-Jun-2024]. 6.4 Predictive Modeling [3] A. Graser, "MovingPandas: A Python Library for Movement Data Analysis," [Online]. Available: https://movingpandas.readthedocs.io/en/main/. [Accessed: 03-The Markov chain analysis provided insights into the probabilities Jun-2024]. of transitions between clusters. By normalizing transition counts [4] A. Graser, "Trajectory Aggregator in MovingPandas," [Online]. Available: https: into probabilities, we gained insights into the likelihood of tourists //movingpandas.readthedocs.io/en/main/trajectoryaggregator.html. [Accessed: 03-Jun-2024]. moving from one cluster to another. Through Markov chain mod- [5] A. Graser, "Movement Analysis Tools on GitHub," [Online]. Available: https: eling, we determined the most probable next cluster after a given //github.com/anitagraser/movement-analysis-tools. [Accessed: 03-Jun-2024]. current cluster. For instance, after analyzing a sequence with cluster 5 which corresponds to New Belgrade, the predicted next cluster was 7, which corresponds to Zemun. This aligns geographically since these 2 locations are close. Additionally, employing Monte Carlo simulations, we had more trials to predict future clusters based on starting clusters. Lastly, by using association rules derived from Apriori [1] analysis, we predicted the next cluster by identifying the longest subsequence matching antecedents in the rules. For instance, when the current sequence contained cluster 36 (representing the Victor statue in Kalemegdan), the predicted next cluster was 38 (representing Roman well in Kalemegdan). Example output of the Apriori algorithm can be seen in Fig. 8. 40 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Volleyball Game Analysis Using Computer Vision Algorithms Marko Plankelj Uroš Mlakar marko.plankelj@student.um.si uros.mlakar@um.si University of Maribor, University of Maribor, Faculty of Electrical Faculty of Electrical Engineering and Computer Science, Engineering and Computer Science, Maribor, Slovenia Maribor, Slovenia ABSTRACT detect the volleyball court and analyze the trajectory of the ball, In recent years, modern technologies have made sports more ac-and displaying the results through a perspective projection of the cessible to a wider audience by providing interactive data during court. The utilized computer vision algorithms in this paper were broadcasts, reducing the risk of human error, and enhancing ath-convolutional neural networks (CNN). letes’ performance through real-time analysis and targeted training The addressed challenge can be broken down into four steps: 1) insights. This paper combines theoretical and practical approaches preparation of the training data, which involved collecting, prepro-by developing an application based on specific convolutional neu-cessing and labeling the data, which we later augmented with the ral networks for volleyball court detection and ball tracking. The aim of increasing the diversity and volume of the dataset, 2) imple-results demonstrate the potential of advanced video analytics in mentation and learning of a CNN based on the prepared dataset, 3) sports, allowing users to explore the opportunities of modern tech-Perspective transformation of the volleyball court, 4) Development nology in improving sports performance. of the online applications and integration of the trained CNN models in connection with a perspective projection to display the final KEYWORDS results. computer vision, convolutional neural networks, perspective transformation, volleyball, web application 3 RELATED WORK The use of modern technologies is no longer limited to pilot projects 1 INTRODUCTION and events of lower ranks, but is gaining ground at the highest sporting events in the world. At the Olympic Games in Paris, in For people, sport has always been a form of relaxation, socializing, collaboration with Intel, technologies were introduced to improve an opportunity to find new friendships with proven beneficial for the experience of participants and spectators. The International health. The World Health Organization recommends playing sports Olympic Committee presented a program to use artificial intel-as part of a healthy lifestyle in a program that aims to bring people ligence in sports to improve athlete performance and spectator closer to physical activity as an opportunity for a healthier, happier experience [6]. and more productive life [11]. On the other hand, we monitor the At the 2022 FIFA World Cup in Qatar, semi-automatic technology development of modern technologies and their use in a wide variety was used to check prohibited positions. The technology uses twelve of fields, including sports, from day to day. cameras placed under the top of the stadium to calculate the position Sport and modern technologies such as computer vision and of twenty-nine key points on each player fifty times per second. To machine learning were completely opposite terms a few years ago, accurately detect the impact of the ball, they use an IMU sensor, but nowadays we can hardly imagine watching or participating in which is placed in the middle of the ball and sends data five hundred sports without the inclusion of modern technologies[5]. Despite times per second [4]. many challenges, such as poorer video quality or overlapping playIn sports such as tennis and volleyball, the Hawk-eye system ers, the use of systems for detecting the field, players and their has been used for many years to track the path of the ball and actions, and tracking the ball during play is becoming more com-determine its position with the help of high-speed cameras placed mon. Whether it’s a real-time analysis of the opposing team, which around the playing surface. It identifies the pixels in each frame that helps coaches find winning tactics, or simply watching a sports correspond to the ball and then compares its position using at least broadcast, during which the directors serve interactive slow-motion two image frames recorded from different camera angles to con-footage of the actions and statistical data about the players. By us-firm or correct the position accordingly [10]. As already mentioned, ing them, they want to prevent controversial situations in matches, even in volleyball, as in other sports, the introduction of advanced improve training and competitor analysis, predict loads in training technologies is not an exception. The Balltime Platform with Artifi-and matches with the aim of preventing injuries, and improve the cial Intelligence Volleyball AI (VOLL-E) divides the volleyball game experience of spectators with analysis before, during and after the into segments, facilitating match analysis and player preparation. match. Using a CNN model that has been trained on numerous volleyball videos, it enables automatic detection of the ball and players on the 2 PROBLEM DESCRIPTION court and labeling with bounding boxes. Based on the recognized The main objective of this paper was to develop a web application positions of the ball and players, the platform recognizes game that enables the users to select a video of a volleyball game and actions such as reception or defense and automatically determines then, with the help of computer vision algorithms, automatically the direction of the attack and displays it visually. It also calculates DOI: https://doi.org/10.18690/um.feri.6.2024.9 ISBN 978-961-286-914-4 41 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Marko Plankelj and Uroš Mlakar the ball speed and lets the users sort game elements by individual Due to the smaller number of data in the training set, we decided and integrate with YouTube to share clips [2]. to augment the data, as shown on Figure 2. We utilized imgaug, Similar functionality to that provided by the Balltime platform which is a dedicated library for augmenting training sets using var-described above is also provided by an Apple mobile application ious augmentation techniques it supports [3]. To retain the entire called Avais, which, with monitors and analyzes the volleyball game court on the image after augmentation, we read the coordinates of in real time. As a result, the users avoid waiting while loading a the volleyball court’s corners and determined the appropriate trans-video of a volleyball match and can obtain data for analysis at the formations based on their distance from the image edge. Transfor-same time [1]. mations were also performed in random order using the Sequential function. After the transformations, we checked whether all the 4 IMPLEMENTATION marked points of the court remained within the image window; if The final solution was implemented in several steps using various any point was outside, we repeated the process up to five times technologies. To ensure accessibility across different devices and and, in case of failure, applied horizontal flipping. locations, a web application was developed, leveraging trained neural network models for detecting volleyball courts and tracking volleyball movements through image frames. 4.1 Collection, preprocessing, labeling, and augmentation of training data We started with collecting, preprocessing, labeling, and augmenting the obtained training data. Primary data collection was performed using freely available data on the internet. Due to a lack of sufficient data, we added our own recordings from volleyball matches to Figure 2: Original image (left) and rotated augmented image the training set. We implemented a script in Python to convert (right). video sequences into image frames and saved them in JPG (Joint Photographic Experts Group) format, one per second. Due to the different sources of training data and their dimensions, we unified 4.2 Perspective transformation their dimensions. Perspective transformations have been applied in various fields, in-Once the dimensions of the image frames were standardized, cluding autonomous driving, where footage from multiple cameras we proceeded with the labeling of the training data, using two mounted on a vehicle was reshaped using perspective transforma-separate methods. For the first method, we manually selected six tion to return a bird’s-eye view covering the entire surroundings of points on each image frame, and then, for each of the six selected the car, making it easier to assess distances between objects around points, recorded the x and y coordinates in JSON (JavaScript Object the vehicle [7]. In our case, we wanted to create a bird’s-eye view, Notation) format for later use, as shown on Figure 1. that is, a top-down perspective of the volleyball court, to facilitate easier subsequent game analysis. Using a library specialized in computer vision called OpenCV, we first calculated the homography matrix, which was then used to transform the original perspective into a top-down view, as shown on Figure 3 Figure 1: Manual labeling of training data points. Figure 3: Original image with court edges marked by green dots (left) and image after perspective transformation (right). As for the second technique for labeling the data used in the training set for ball detection, we utilized the functionality of the web 4.3 Implementation and training of CNN platform Roboflow [8], which provides developers with comprehen-models sive services for building computer vision applications, including For training, we chose two separate convolutional neural network data labeling in training sets. models: the segmentation neural network U-Net, which was used 42 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Volleyball Game Analysis Using Computer Vision Algorithms for detecting the volleyball court, and the YOLOv8 model, which was used for detecting the ball. In both cases, we followed an approach where separate datasets were used for training and testing, meaning none of the images used during training were later used for testing the performance of any of the neural network models. The segmentation neural network U-Net, which has a symmetrical structure provided by the encoder and decoder parts, was implemented using the open-source machine learning framework PyTorch. For the YOLO model, we decided to use one of the newer ver-sions, specifically version eight, developed by Ultralytics [9]. Although we could have conducted training on Ultralytic’s platform, we preferred to install the Ultralytics library locally and integrate Figure 4: Web application steps: top left - Court detection it into our framework. For training purposes, we used a pre-trained with U-Net, top right - Ball tracking with YOLOv8, bottom left YOLOv8 model (which we further fine-tuned with our own data - Perspective transformation, bottom right - Ball trajectory using previously labeled data, stored in the YAML format (a data se-across frames. rialization format), which was generated on the Roboflow platform during data annotation). Despite the successful implementation of the desired functionalities, the application does not perform perfectly in specific situations. This becomes evident mainly in cases where the input video contains image frames different from those used to train the CNN. The most common issues we observed were: 4.4 Web application • Different camera positions: when the volleyball match is recorded from a different camera angle than those used in To ensure better accessibility, regardless of the location and device the training dataset. of the end user, a web application was developed. The main pur- • Multiple lines on the court: when the video includes multi-pose of this application is to use trained neural network models for ple lines that do not pertain solely to the volleyball court. detecting the volleyball court and tracking the volleyball through • Color scheme of the ball: in some leagues (even in top image frames during a short segment of a volleyball match. For de-leagues, for example in Italy), they are using a ball of dif-velopment we used the Flask web microframework for the backend ferent color. Additionally, color schemes of the court might of the application, while the interface was enhanced and improved overlap with the ball, making it challenging to detect the using the open-source CSS framework Bootstrap. ball accurately. • Key court points or ball covered by players: when key points of the court are covered by players, as shown in the Figure 5, or by other obstacles (e.g., the net covering the line on the court farthest from the camera if the camera is positioned too low), resulting in difficulties in detecting the volleyball 5 RESULTS court or the ball. The final solution is essentially divided into two main parts. In the • Real time processing limitations: based on computational resources and web application complexity, real-time stream-first part, the user can choose between previously prepared clips ing and analysis of video (especially with higher resolution) or select their own video from the device through which they are might introduce latency or affect performance. accessing the web application. After selecting the video, the user can start the analysis by clicking a button, which is divided into four key components, as shown in Figure 4: • Detection of the volleyball court using the trained CNN U-Net model, • Tracking the volleyball and detecting it using the CNN model YOLOv8, • Perspective transformation of the volleyball court, • Attack direction, where the movement of the volleyball is shown through a sequence of image frames. Figure 5: Key court points covered by players. 43 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Marko Plankelj and Uroš Mlakar 6 CONCLUSIONS other sports, thereby attracting a wider range of users. All these In this paper, we presented the process from the initial idea to a reasons encourage the realization that the integration of modern functional web application that provides a solution for the original technologies into all segments of our lives, including sports, is no concept, which was the automatic analysis and visual presenta-longer a binary question but merely a matter of time. tion of the results to the user. The methods used were based on collecting, preparing, labeling, and augmenting data, which were REFERENCES then used to train two CNN. The trained models were then used, [1] Avais. 2023. Our Features. https://www.avais.ai/features. Accessed: 2024-06-02. [2] Balltime Academy. 2024. What is Volleyball AI. https://academy.balltime.com/ in conjunction with the perspective transformation of the court, getting-started/what-is-volleyball-ai. Accessed: 2024-06-02. to analyze and visually present the results to the user. The final [3] Imgaug. 2020. Documentation. https://imgaug.readthedocs.io/en/latest/. Ac-result was presented as a user-friendly web application, where the cessed: 2024-06-02. [4] Inside FIFA. 2022. Semi-automated Offside Technology to be Used at FIFA World user can select a desired video and receive a basic analysis within Cup 2022™. https://inside.fifa.com/technical/media-releases/semi-automated-seconds, including court detection and volleyball tracking. offside-technology-to-be-used-at-fifa-world-cup-2022-tm. Accessed: 2024-06-However, despite a successful implementation, the application 02. [5] B.T. Naik, M.F. Hashmi, and N.D. Bokde. 2022. A Comprehensive Review of has some weaknesses and scenarios where the results are not as Computer Vision in Sports: Open Issues, Future Trends and Research Directions. expected. Challenges include different camera positions, diverse Applied Sciences 12, 9 (2022), 4429. [6] Olympics. 2024. IOC Takes the Lead for the Olympic Movement and Launches ball color schemes or multiple balls at the same time, interference Olympic AI Agenda. https://olympics.com/ioc/news/ioc-takes-the-lead-for-the-that may obscure key court points or the ball itself. Additionally, olympic-movement-and-launches-olympic-ai-agenda. Accessed: 2024-06-02. real-time processing can be affected by computational limitations [7] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In 2016 IEEE Conference and video quality, potentially impacting performance. on Computer Vision and Pattern Recognition (CVPR). 779–788. Nonetheless, the implemented application serves as a founda- [8] Roboflow. 2024. Our Company. https://roboflow.com/about. Accessed: 2024-06-tion that allows for numerous upgrades, which could be inspired 02. [9] Ultralytics. 2024. YOLOv8 Models Documentation. by existing solutions and improve upon their shortcomings. As a [10] Wikipedia. 2024. Hawk-Eye. https://en.wikipedia.org/wiki/Hawk-Eye. Accessed: final result, we could provide users with real-time statistics of a 2024-06-02. [11] World Health Organization. 2024. Sports and Health Initiative. https://www. volleyball match, and the platform could be expanded to include who.int/initiatives/sports-and-health. Accessed: 2024-08-30. 44 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 A Bayesian Approach to Modeling GPS Errors for Comparing Forensic Evidence Nika Molan Ema Leila Grošelj Klemen Vovk nm83087@student.uni-lj.si eg61487@student.uni-lj.si kv4582@student.uni-lj.si University of Ljubljana, University of Ljubljana, University of Ljubljana, Faculty of Computer and Faculty of Computer and Faculty of Computer and Information Science, Information Science, Information Science, Ljubljana, Slovenia Ljubljana, Slovenia Ljubljana, Slovenia ABSTRACT • Code implementation of the proposed approach along with This paper introduces a Bayesian approach to modeling GPS er-MCMC diagnostics, results, and visualizations made avail-rors for comparing forensic evidence, addressing the challenge of able at [2], determining the most likely source of a single GPS localization • two GPS measurement datasets collected in Ljubljana and given two proposed locations. We develop a probabilistic model Novo mesto to aid further research. that transforms GPS coordinates into polar coordinates, capturing distance and directional errors. Our method employs Markov chain 2 RELATED WORK Monte Carlo (MCMC) sampling to estimate the data-generating The increased availability of GPS logs from smartphones, activity processes of GPS measurements, enabling robust comparison of trackers, navigation, and autonomous vehicles has increased the potential device locations while quantifying uncertainty. We apply use of such digital evidence in court[1]. Due to the high risk of this approach to three datasets: one from existing literature and two misinterpretation and wrongful judgment or discrediting other newly collected datasets from Ljubljana and Novo mesto. The result evidence, statistical methods have been proposed [3] to be able to is a posterior distribution of log-likelihood ratios directly compar-quantify and reason under uncertainty due to GPS measurement ing the two propositions, which can be transformed into likelihood errors. ratios to comply with current standards in forensic science. The magnitude of GPS errors varies between devices and geographical locations. In [8] the authors report a localization error KEYWORDS of up to 5 meters in low-cost phones, while others (see [5]) have device geolocation as evidence, MCMC, digital forensics, likelihood measurement errors varying up to 100 meters. ratio, Stan, Bayesian inference The standard in forensic science for evidence source identification is to use likelihood ratios [7]. As different magnitudes of likelihood ratios are not easily explainable in court, forensics standards 1 INTRODUCTION have been developed to define the verbal equivalent of likelihood GPS as evidence has been proven problematic in court, as it has ratio ranges to provide in court[4], with the current version of the often been dismissed or not presented at all [1] due to the fear of standard shown in Table 1. wrong judgment or discrediting other evidence due to the uncer-In [5] the authors computed a likelihood ratio to compare two tainty of GPS measurements. Our goal is to provide a Bayesian proposed device locations (P1 and P2) in light of the evidence E. approach for evaluating a single GPS localization in light of two They also considered that errors of GPS measurements may not proposed locations. To give more context to why such approaches be equal in all directions (the horizontal error is dependent on the are needed, consider the following problem: a single GPS localiza-direction) resulting in high computational complexity due to sample tion (evidence point 𝐸) was extracted from a device D found on dependence and brute force distribution fitting. the crime scene. Since E is a GPS measurement there is inherently some measurement error. Additionally, someone could have moved 3 DATASETS the device D during or after the crime. Investigators propose two For all used datasets we provide data sources as well as scripts geographical locations (P1 and P2) of where the device could have to transform data from other works to the input format for our been when it measured E and we want to identify which of the two approach. The scripts used to transform and clean the datasets that propositions is more likely. Since the conclusion is to be presented were used as inputs for modeling are also provided. in court, we need to provide sufficiently precise verbal equivalents The University of Lausanne dataset (UNIL) was obtained from of the results while not misleading or misrepresenting the weight the public implementation of [5]. It consists of 699 GPS measure-of a piece of evidence. ments that were taken from two proposal points as reference mea-The contributions of this paper are summarized as follows: surements on the University of Lausanne campus. The dataset is • a Bayesian statistics approach utilizing Markov chain Monte visualized in Figure 1. Carlo sampling to estimate the data generating processes Our Ljubljana (LJ) dataset consists of 4 predefined points (evi- (DGPs) of GPS measurements taken from different locations dence 𝐸, and three proposal points 𝑃1, 𝑃2, and 𝑃3, each point is spec-to compare which DGP most likely generated a single GPS ified by latitude and longitude) and a total of 450 images captured measurement obtained as forensic evidence, while standing on those proposal points along with information DOI: https://doi.org/10.18690/um.feri.6.2024.10 ISBN 978-961-286-914-4 45 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Nika Molan, Ema Leila Grošelj, and Klemen Vovk Figure 1: A geographical visualization of the UNIL dataset from [5]. E is the single localization that was recovered, while P1 and P2 are two proposed locations. Only deduplicated reference measurements for both proposals are shown. Note how reference measurements are not even on the same building as the proposal they were measured from. Figure 2: A geographical visualization of our LJ dataset. Only deduplicated measurements are shown. if the iPhone camera app had permission for precise location for every image. A visualization of the dataset is shown in Figure 2. Our Novo mesto (NM) dataset consists of 4 predefined points (evidence 𝐸, and three proposal points 𝑃1, 𝑃2, and 𝑃3, each point is specified by latitude and longitude) and a total of 429 images captured using the same data collection process as the LJ dataset. A visualization of the dataset is shown in Figure 3. Compared to the UNIL dataset, LJ and NM datasets have significantly less measurement error (on average 15 meters) and a high percentage (90%) of duplicate measurements. This is due to the measurements being done in a very short time interval (30 minutes) which resulted in a lot of cached duplicates. Figure 3: A geographical visualization of our NM dataset. 4 METHODS Only deduplicated measurements are shown. Unless specified, everything in this section applies to all used datasets (UNIL, LJ, and NM). 4.2 Dataset preprocessing Each measurement is defined by time, latitude, longitude, and the 4.1 LJ and NM dataset collection label of the proposal it was taken from. For LJ and NM data we To more clearly understand what affects the accuracy of GPS evi-keep only measurements that were retrieved when precise loca-dence as well as error patterns, we collect two additional datasets tion permission was given to the iPhone camera app. We remove with the same device. The distance between the predefined points consecutive duplicate GPS measurements per proposal by sorting was around 100 meters in Ljubljana and 10 meters in Novo mesto. all corresponding measurements ascending by date and time, then To take images an iPhone 11 Pro was used with the default cam-removing all consecutive duplicates based on latitude and longi-era app. We also note granting and denying permission to precise tude. This is done because consecutive duplicates could be due to locations to the camera app for each image. To obtain GPS mea-caching and/or rate-limiting to GPS queries. Consequently, if we surements from images, we extract the latitude, longitude, and time try to model distance and angle errors of GPS measurements, some of capture from the EXIF data of each image, and note the corre-angles/distances will have artificially more probability mass due sponding proposal point it was taken from and if precise location to duplicates, even though these duplicates are obtained from the permission was granted. same GPS measurement. 46 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 A Bayesian Approach to Modeling GPS Errors for Comparing Forensic Evidence 4.3 Transforming GPS to polar coordinates modeled as a bivariate normal distribution: To simplify the modeling and representation of GPS errors, each 𝑀 measurement was converted from (latitude, longitude) coordinates 𝑖 𝑗 ∼ MultiNormal(𝜇 𝑗 , Σ𝑗 ) to distance (in meters) and angle (azimuth from the north, in ra- 𝑀𝑖𝑘 ∼ MultiNormal(𝜇𝑘, Σ𝑘) dians) from the ground truth point (proposal) it was taken from. 𝜇𝑗 = [𝜇𝑑𝑖𝑠𝑡 , 𝜇 ] 𝑗 𝑎𝑛𝑔𝑙𝑒𝑗 To illustrate the concept we show the UNIL dataset transformed 𝜇𝑘 = [𝜇𝑑𝑖𝑠𝑡 , 𝜇𝑎𝑛𝑔𝑙𝑒 ] to polar coordinates in Figure 4. We aim to model the distance 𝑘 𝑘 and directionality of GPS errors taken from proposal points (and Σ𝑗 ∈ R2𝑥2 consequently their DGP) to estimate under which proposal point is Σ𝑘 ∈ R2𝑥2 the retrieved evidence point E more likely. where 𝜇𝑗 is a mean vector of distance (in meters) and angle (in radians) relative to proposal 𝑃𝑗. Σ𝑗 is the covariance matrix1. Analogously for 𝜇𝑘 and Σ𝑘 relative to proposal 𝑃𝑘. Stan’s default, non-informative priors were used for all parameters. To compare under which proposal (𝑃𝑗 or 𝑃𝑘) is the evidence 270° point 𝐸 more likely, we compute the likelihood of 𝐸 under each of the models and compute the likelihood ratio. To enhance numerical stability without loss of expressiveness the logarithms of 225° 315° likelihoods were used, which can later be exponentiated back. This was implemented in Stan’s generated quantities block2: log 𝐿𝑗 = log 𝑃 (𝐸𝑗 |𝜇𝑗, Σ𝑗) log 𝐿𝑘 = log 𝑃(𝐸𝑘 |𝜇𝑘, Σ𝑘) 180° 0° 20 40 log 𝐿𝑅 = log 𝐿𝑗 − log 𝐿 60 𝑘 80 100 where 𝐸𝑗 and 𝐸𝑘 denote the evidence point 𝐸 transformed relative to proposals 𝑃𝑗 and 𝑃𝑘 respectively. E w.r.t. P1 To assess if the models capture the input data (reference mea-E w.r.t. P2 135° 45° P1 measurements surements) well, a posterior predictive check was performed by P2 measurements randomly sampling points from the estimated bivariate normal models to create replicate datasets (this is also done in the gener-90° ated quantities block in Stan). The idea is that if an estimated model fits input data well, we should be able to generate similar, synthetic Figure 4: The UNIL dataset [5] after coordinate transfor-data by randomly sampling from it. In other words, if the estimated mation (distance error in meters and angle - azimuth from model managed to capture the behavior of distance and angle errors north). Note, we only have one evidence point E, we trans-in our reference measurements, it should be able to generate new, form it concerning (relative to, as seen from) each proposal synthetic, measurements that resemble the same distance and angle point separately. Proposals P1 and P2 are at the center of the errors. To visualize this, we overlay the generated synthetic data polar plot and measurements show the directionality (angle) over the reference measurements (input data). and magnitude (in meters) of the GPS errors. 5 RESULTS Due to the length limit of the paper we only show the full results for the UNIL dataset. However, all results, visualizations, and MCMC diagnostics are available in the provided repository. MCMC sampling with 4 chains of 4000 samples each was performed to sample from the posterior. Standard MCMC diagnostics 4.4 Probabilistic model of GPS errors (trace plots, effective sample sizes, R-hat values) do not indicate any issues in convergence. Additionally, we visualize a posterior pre-To formalize our approach to modeling GPS errors, we define a dictive check of the UNIL dataset by overlaying a random replicate probabilistic model and estimate its parameters with MCMC sam-dataset over the real measurements in Figure 5. pling. The model is defined in the Stan probabilistic programming Figure 6 depicts the posterior distribution of log-likelihood ratios language[6] which allows users to define (log) density functions for the UNIL dataset along with 95% highest-density intervals. In and then perform Bayesian inference with MCMC sampling. Let 𝑀𝑖𝑗 = (𝑑𝑗,𝜙𝑗) be the 𝑖 − 𝑡ℎ measurement measured from 1We use the Cholesky parameterization of the Multivariate normal, which is natively proposal 𝑃 implemented in Stan, so for efficiency and numerical stability during 𝑗 where 𝑑 𝑗 is the distance in meters from 𝑃 𝑗 to 𝑀𝑖 𝑗 Σ𝑗 = 𝐿𝑗 𝐿′𝑗 and 𝜙 MCMC sampling, but omit it here for brevity 𝑗 the azimuth (in radians) of the line between 𝑃 𝑗 and 𝑀𝑖 𝑗 . 2 Analogously, let Everything in the generated quantities block can be computed outside of Stan (e.g. in 𝑀𝑖𝑘 be the 𝑖 −𝑡ℎ reference measurement measured Python) as it is performed on the posterior draws after the MCMC sampling is done, from proposal 𝑃𝑘. The set of measurements for each proposal was we do it in Stan for clarity. 47 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Nika Molan, Ema Leila Grošelj, and Klemen Vovk Replicate dataset UNIL #1 log likelihood ratio P1 vs P2 5.0 mean=17 4.5 4.0 Density 95% HDI 3.5 Angle (radians) E w.r.t. P1 11 23 3.0 E w.r.t. P2 P1 10 15 20 25 30 35 log likelihood ratio 2.5 P2 generated P1 generated P2 Figure 6: The posterior distribution of log-likelihood ratios 20 40 60 80 100 120 for the UNIL dataset along with 95% highest-density intervals Distance (meters) to quantify uncertainty. Figure 5: A posterior predictive check for the UNIL dataset Table 1: LR verbal equivalents to use in court when comparin the form of a randomly selected replicate dataset for each ing two propositions, obtained from [4] page 39. proposal. All generated measurements for P1 and P2 are within expected regions, however, the model for P1 is better Range of LR Verbal Equivalent supported by the real measurements as the dataset heavily In my opinion the observations are no favors P1 as the source of E. This is also noted by the authors more probable if [P1] rather than[P2] of the dataset [5]. 1-3 were true. Therefore, the observations do not assist in addressing which of the two propositions is true. line with the dataset, the P1 proposal is heavily favored compared In my opinion the observations are to P2 to have generated the evidence point. Log-likelihood ratios 4-10 slightly more probable if [P1] rather can be converted back to likelihood ratios3 which are currently the than [P2] were true. standard used in courts as per [4] and [5] to compare evidence for In my opinion the observations are more source-identification in forensic science. 10-100 probable if [P1] rather than [P2] While the model is stable and MCMC converges, even the lower-were true. bound of the 95% highest density interval log-likelihood ratio is In my opinion the observations are log 𝐿𝑅 = 11, which after exponentiation is 𝐿𝑅 = exp 11 = 59874, 100-1000 much more probable if [P1] rather which is orders of magnitude above the highest obtainable LR than [P2] were true. range for a single piece of evidence (see Table 1 and the standard specification in [4]). more GPS logs from the crime scene) to directly model the errors 6 DISCUSSION instead of using a proxy such as reference measurements. Our method, utilizing MCMC sampling to estimate data-generating processes of GPS measurements, offers direct uncertainty quantifi-REFERENCES cation, greater computational efficiency, and numerical stability [1] E. Casey, D.-O. Jaquet-Chiffelle, H. Spichiger, E. Ryser, and T. Souvignet. Structuring the evaluation of location-related mobile device evidence. due to Stan, MCMC, and working with log-likelihood ratios in-Forensic Science International: Digital Investigation, 32:300928, 2020. stead of likelihood ratios compared to the seminal method from [2] Ema Leila Grošelj, Nika Molan and Klemen Vovk. A Bayesian Approach to [5]. The main limitation of our approach is the assumption that Modeling GPS Errors for Comparing Forensic Evidence, 2024. https://github.com/ KlemenVovk/gps_evaluation, Last accessed on 2024-08-30. reference measurements taken (months) after the original evidence [3] C. Galbraith, P. Smyth, and H. S. Stern. Statistical methods for the forensic analysis are enough to sufficiently model the DGP of GPS errors. Even if the of geolocated event data. Forensic Science International: Digital Investigation, exact device from the crime scene is used, many other variables are 33:301009, 2020. [4] F. S. Regulator. Development of evaluative opinions. Technical Report FSR-C-118, out of our control (GPS satellite visibility, noise and interference of UK Forensic Science Regulator, Birmingham, 2021. GPS positioning, software updates changing the measuring process, [5] H. Spichiger. A likelihood ratio approach for the evaluation of single point device locations. cellular and WiFi networks that are used to improve location). In-Forensic Science International: Digital Investigation, 2023. [6] Stan Development Team. Stan modeling language users guide and reference vestigators should always strive to gather more actual evidence (i.e. manual, version 2.35, 2011–2024. [7] M. M. Vink, M. M. Sjerps, A. A. Boztas, et al. Likelihood ratio method for the 3Highest-density intervals are not equal-tailed, this means that when applying trans-interpretation of iphone health app data in digital forensics. Forensic Science formations, such as exponentiation to the whole distribution, the HDI will change, International: Digital Investigation, 41:301389, 2022. For such cases we recommend computing equal-tailed credible intervals that are not [8] L. Wang, Z. Li, N. Wang, and Z. Wang. Real-time gnss precise point positioning affected by distribution transformations. for low-cost smart devices. GPS Solutions, 25:1–13, 2021. 48 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Seven Components of Computational Thinking: Assessing the Quality of Dr. Scratch Metrics Using 230,000 Scratch Projects Gal Bubnič Tomaž Kosar Bostjan Bubnic gb78843@student.uni-lj.si tomaz.kosar@um.si bostjan.bubnic@student.um.si University of Ljubljana, University of Maribor, University of Maribor, Faculty of Natural Faculty of Electrical Faculty of Electrical Sciences and Engineering, Engineering and Computer Science, Engineering and Computer Science, Ljubljana, Slovenia Maribor, Slovenia Maribor, Slovenia ABSTRACT recognition; 2) Bubnic and Kosar [5] identified abstraction and al-Computational thinking has extended beyond traditional comput-gorithms as relevant, domain independent components; 3) Lyon ing education recently and is becoming a broad educational move-and J. Magana [11] argued that abstraction is the most definitional ment, focused on teaching and learning critical problem-solving term. skills across various disciplines. Originating from computer science Since computational thinking originates from computer science and programming, the most common learning method still involves and programming, it is commonly learned and assessed today educational programming languages like Scratch. Dr. Scratch is a through educational programming languages like Scratch. Scratch tool designed to assess Scratch projects based on seven components was created by the Lifelong Kindergarten Group at the MIT Media of computational thinking, including abstraction, parallelism, logic, Laboratory to provide a new environment for beginner program-synchronization, flow control, user interactivity, and data represen-mers. Scratch programs are created using scripts assembled by tation. This study examines the quality of Dr. Scratch measurement dragging and dropping blocks, which symbolize various program-scale. The proposed model considers computational thinking as a ming elements, such as expressions, conditions, statements, and latent variable with seven indicators. According to the results of variables. This approach helps avoiding common syntax errors, confirmatory factor analysis, five of the computational thinking which often frustrate students. The programming environment also components were measured satisfactorily, while two were below features interactive, 2-dimensional animations called sprites, which the accepted level. Based on the results, we recommend conducting move on the screen according to user input or script commands. In an exploratory factor analysis for the potential scale refinement. addition, audio and video clips from webcams can be incorporated into Scratch projects. KEYWORDS Following the creation of Scratch, its user base grew rapidly. Due to its rapid expansion, the need for evaluation tools became more computational thinking, Dr. Scratch, block-based programming, evident. In response, researchers developed several tools aimed at assessment, confirmatory factor analysis evaluating Scratch projects, such as Dr. Scratch [13] and Hairball [2]. Dr. Scratch is an on-line tool, which automatically assesses Scratch 1 INTRODUCTION projects based on seven components of computational thinking, including abstraction (e.g. custom blocks), parallelism (e.g. two or Although the phrase computational thinking was introduced as more simultaneous scripts), logic (e.g. logic operations), synchro-a computer science concept in the early 1980s, the concept was nization (e.g. wait until), flow control (e.g. repeat until), user inter-popularized by Janette Wing in 2006 [19]. Wing described it as the activity (e.g. input sound), and data representation (e.g. variables, ability to solve problems, design systems, and understand human lists). behavior by leveraging fundamental computer science concepts. The objective of this study was to examine Dr. Scratch’s method Recently, computational thinking has extended beyond traditional for measuring computational thinking. The motivation for this computing education into various interdisciplinary fields. It has study arises from the novelty of applying our method, particularly been integrated into K-12 education, fostering problem-solving on such a large dataset. Utilizing a publicly available dataset con-skills from an early age [10]. In addition, disciplines such as biology, taining over 230,000 Scratch projects, a latent variable model was physics, and social sciences are adopting computational thinking proposed. The model considers computational thinking as a latent principles to tackle complex problems, analyze data, and model variable with seven indicators. Confirmatory factor analysis was systems. This broadening of scope highlights the versatility and used to assess the quality of the proposed model. Our results showed importance of computational thinking as one of the fundamental that five components of computational thinking were measured skills for the 21st century [12]. satisfactorily, while two were below the accepted level. Despite its widespread adoption, there is still no consensus on the precise definition of computational thinking. Moreover, there is no consensus concerning its definitive or necessary components 2 BACKGROUND [5]. However, several studies have investigated the components After the creation of Scratch, researchers and educators have started that form its foundation. Based on the literature review: 1) Kale-analyzing Scratch programs. However, evaluating programs in lioglu et al. [9] advocated that the most important components Scratch proved to be challenging due to the platform’s block-based, are abstraction, problem-solving, algorithmic thinking, and pattern visual format and the wide range of programming approaches used DOI: https://doi.org/10.18690/um.feri.6.2024.11 ISBN 978-961-286-914-4 49 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Gal Bubnič, Tomaž Kosar, and Bostjan Bubnic Computational Thinking 0.74 0.63 0. 0 4 0 2 0.65 . . 6 58 0.71 6 Flow User Data Abstraction Parallelism Logic Synchronization Control Interactivity Representation Figure 1: Latent variable model of Dr. Scratch with factor loadings derived from confirmatory factor analysis (CFA) by beginners. These challenges led to the creation of tools for assess-3 METHOD ing Scratch programs. Hairball [2] was one of the first tools created, To assess the quality of the Dr. Scratch measurement scale, we first designed to analyze the code of Scratch projects for common pro-obtained a dataset of Scratch projects 1, which was constructed gramming patterns and potential coding issues. Following Hairball, by Aivaloglou et al. [1]. The authors collected data from more Ninja Code Village [17] emerged as a platform, which offered more than 250,000 Scratch projects, from more than 100,000 different interactive feedback by helping users improve their coding skills users. After collecting data from the Scratch repository, authors through identifying areas in their projects that could be optimized also analyzed the collected projects with Dr. Scratch. As a result, the or corrected. Finally, Dr. Scratch [13] was introduced to provide dataset comprises 231,050 Scratch projects, which were successfully a more comprehensive evaluation, assessing computational think-evaluated using Dr. Scratch metrics [1]. ing skills across seven indicators: abstraction, parallelism, logic, After obtaining the dataset, a Grades table was extracted. A la-synchronization, flow control, user interactivity, and data representent variable model was constructed based on the data in the Grades tation. Each component is measured on a scale from 0 to 3, with 0 table. Latent variable models are statistical models that relate a set being the lowest and 3 the highest. The final score is the sum of all of unobservable (latent) variables and a set of observable (indicator) 7 components, thus ranging from 0 to 21 [13]. variables [4]. In our study, computational thinking was introduced Several studies have investigated the quality of the Dr. Scratch as a latent variable with seven indicators, namely, abstraction, par-metrics. A study by Moreno-León et al. [14] compared Dr. Scratch allelism, logic, synchronization, flow control, user interactivity, and scores with Halstead’s metrics and McCabe Cyclomatic Complexity, data representation. We used confirmatory factor analysis to ex-where vocabulary and length were the selected measures. Ninety-amine the validity, reliability, and factor structure of the proposed five Scratch projects were selected for the study, with a wide range measurement model. The model is presented in Figure 1. of Dr. Scratch scores, varying from 5 to 20. According to the results, both complexity measures exhibited a strong positive correlation 4 DATA ANALYSIS AND RESULTS with the scores from Dr. Scratch. Another study by Moreno-León et al. [13] examined the ecological validity of Dr. Scratch. The Confirmatory factor analysis (CFA) was conducted using IBM AMOS sample size consisted of 109 participants, aged between 10 and 14 26. We used Microsoft Excel and IBM SPSS for calculating means, years, from 8 different Spanish schools. Each participant submit-standard deviations, composite reliabilities (CR), and average vari-ted a Scratch project to Dr. Scratch first. Based on the feedback, ances extracted (AVE). The model with estimates for factor loadings students could improve their Scratch projects using the recommen-is presented in Figure 1. dations and suggestions provided by the tool. The results showed a The 𝜒2 value (𝜒2 (14) = 66,057) was significant, and the RMSEA statistically significant score increase based on the feedback by Dr. was greater than the suggested threshold of 0.08 (RMSEA = 0.143). Scratch. Convergent validity of Dr. Scratch was studied by Moreno-This would suggest that we had no statistical support for accepting León et al. [16]. Fifty-three Scratch projects were evaluated by 16 the proposed model. However, in line with representative literature specialists with a solid understanding of computer science educa- [e.g., 3], 𝜒2 may not be the only appropriate standard, particularly tion in the first stage of the experiment. More than 450 evaluations when sample sizes are large. Accordingly, we used additional fit were conducted. The same projects were graded by Dr. Scratch indices to assess the goodness of fit, including GFI, RMR, NFI, IFI, in the second stage. A strong correlation was identified between and CFI. GFI was above 0.9 and RMR was lower than 0.1, which scores from Dr. Scratch and evaluations by computer science ed-indicated a good fit of our model [8]. Furthermore, NFI, IFI, and CFI ucation specialists. Last but not least, a discriminant validity of were all slightly below 0.9, but still acceptable. According to these Dr. Scratch was demonstrated by Moreno-León et al. [15], who results, we concluded that the overall model-data fit was acceptable. examined 250 Scratch projects, which were segmented into five The measurement model fit indices are presented in Table 1. categories, including games, art, music, stories, and animations. 1https://github.com/TUDelftScratchLab/ScratchDataset 50 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Seven Components of Computational Thinking: Assessing the Quality of Dr. Scratch Metrics Using 230,000 Scratch Projects Table 1: Model fit indices for Dr. Scratch model 𝜒2 df p (𝜒2) GFI RMR NFI IFI CFI RMSEA 66,057 14 0.001 0.919 0.058 0.870 0.870 0.870 0.143 df degrees of freedom, GFI goodness of fit index, RMR root mean square residual, NFI normed fit index, IFI incremental fit index, CFI comparative fit index, RMSEA root mean square error of approximation Table 2: Means, Standard Deviations, Loadings, Composite Reliabilities (CR), and Average Variances Extracted (AVE) for Dr. Scratch model Latent Indicator M SD Loadings CR AVE Abstraction 1.057 0.796 0.63 Parallelism 1.148 1.079 0.65 Logic 0.756 1.092 0.71 CT Synchronization 1.233 1.074 0.66 0.821 0.402 Flow Control 1.887 0.610 0.58 User Interactivity 1.563 0.530 0.42 Data Representation 1.273 0.682 0.74 The standardized factor loadings, composite reliability (CR), and Factor loading for flow control was slightly below the threshold average extracted variance (AVE) are presented in Table 2. Standard- (0.58), while a value for user interactivity was only 0.42. Conse-ized factor loadings for abstraction, parallelism, synchronization, quently, only 18% of the variance in the user interactivity is ex-logic, and data representation were all higher than 0.6, varying from plained by computational thinking. Accordingly, flow control and 0.63 to 0.74. Such results pointed toward satisfactory convergent user interactivity where not measured effectively. User interactivity validity of these components [7]. On the other hand, factor loadings tends to be the weakest component on the Dr. Scratch scale. for user interactivity and flow control were below the acceptable Based on the results, CR of the proposed model (CR = 0.82) level of 0.6. CR was higher than the suggested threshold of 0.8 (CR = demonstrated sound reliability and high level of internal consis-0.82), which confirmed the reliability of the computational thinking tency. This suggests that seven components consistently measure construct [7]. On the other hand, AVE was lower than 0.5 (AVE = computational thinking. However, AVE was below 0.5 (AVE = 0.4), 0.40), which indicated that the convergent validity of the proposed showing that, on the average, only 40% of the variance of the com-measurement model might be weaker than anticipated. putational thinking components is explained by computational thinking. Values of CR and AVE revealed a discrepancy between 5 DISCUSSION internal consistency and the amount of variance explained by the Dr. Scratch is an assessment tool for evaluating Scratch projects computational thinking. Such a discrepancy could be attributed to: based on seven components of computational thinking. This study low indicator quality, low factor loadings or measurement errors employed confirmatory factor analysis to evaluate the quality of [6]. Regarding the low indicator quality, the high CR suggests that the Dr. Scratch measurement scale. To construct a measurement the indicators are reliably measuring the same construct, but the model, computational thinking was introduced as a latent vari-low AVE indicates that the indicators may not be capturing the able with seven indicators, corresponding to seven components of construct very well. This means that while Dr. Scratch assessment computational thinking used by Dr. Scratch. While several studies items are consistent with each other, they do not explain much of have previously examined validity and reliability of Dr. Scratch on the variance of the computational thinking. Concerning potential smaller samples, our study utilized a large sample of more than low factor loadings, loadings for user activity and flow control were 230,000 Scratch projects. lower than anticipated. Since AVE is a function of the squared factor According to the results of the confirmatory factor analysis, fac-loadings, lower loadings can result in a lower AVE even when CR tor loadings of abstraction, parallelism, synchronization, logic, and is high. Regarding the potential presence of a measurement error, a data representation were above the selected threshold of 0.6. Such a lower than expected AVE could be due to high measurement error threshold indicates that at least 36% of the variance in the aforemen-in the indicators. Namely, even if the indicators are internally con-tioned components is explained by computational thinking. In this sistent, significant measurement errors can reduce the proportion context, we consider that five computational thinking components of variance explained by the construct. were measured satisfactorily. In addition, Hair et al. [7] suggested that, ideally, a factor loading should be at least 0.7. In this case, 49% 5.1 Limitations of the variance in the observed variable is explained by the latent. The results are primarily limited to the data extracted from publicly According to our results, only data representation and logic surpass available dataset, which includes only projects submitted into the the 0.7 threshold. In this context, data representation tends to be Scratch repository up until 2017. It is possible that the programming the prime component on the Dr. Scratch scale. habits of Scratch users have evolved over time. Another limitation 51 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Gal Bubnič, Tomaž Kosar, and Bostjan Bubnic exists, because Scratch projects were not randomly selected from [12] Ana Melro, Georgie Tarling, Taro Fujita, and Judith Kleine Staarman. 2023. What Scratch repository. Instead, a scraper program collected the most else can be learned when coding? A configurative literature review of learning recent projects available at the time it was running [1]. According opportunities through computational thinking. Journal of Educational Computing Research 61, 4 (2023), 901–924. to Aivaloglou et al. [1], this limitation was somehow mediated [13] Jesús Moreno-León, Gregorio Robles, and Marcos Román-González. 2015. Dr. by collecting a large dataset, which comprises around 1.3% of 19 Scratch: Automatic analysis of scratch projects to assess and foster computational thinking. million shared Scratch projects. RED. Revista de Educación a Distancia 46 (2015), 1–23. [14] Jesús Moreno-León, Gregorio Robles, and Marcos Román-González. 2016. Another limitation of our approach stems from the fact that Dr. Comparing computational thinking development assessment scores with soft-Scratch was designed as a formative assessment, rather than a diag-ware complexity metrics. In 2016 IEEE global engineering education conference (EDUCON). IEEE, 1040–1045. nostic tool or measurement scale. To fully and comprehensively as- [15] Jesús Moreno-León, Gregorio Robles, and Marcos Román-González. 2017. To-sess computational thinking, Dr. Scratch needs to be supplemented wards data-driven learning paths to develop computational thinking with scratch. with other types of tools, such as a computational thinking test IEEE Transactions on Emerging Topics in Computing 8, 1 (2017), 193–205. [16] Jesús Moreno-León, Marcos Román-González, Casper Harteveld, and Gregorio [18]. Robles. 2017. On the automatic assessment of computational thinking skills: A comparison with human experts. In Proceedings of the 2017 CHI Conference 6 CONCLUSION Extended Abstracts on Human Factors in Computing Systems. 2788–2795. [17] Go Ota, Yosuke Morimoto, and Hiroshi Kato. 2016. Ninja code village for scratch: This study evaluated the quality of Dr. Scratch measurement scale Function samples/function analyser and automatic assessment of computational thinking concepts. In 2016 IEEE Symposium on Visual Languages and Human-using a publicly available dataset with more than 230,000 Scratch Centric Computing (VL/HCC). IEEE, 238–239. projects submitted to the Scratch repository up until 2017. Accord- [18] Marcos Román-González, Jesús Moreno-León, and Gregorio Robles. 2019. Combining assessment tools for a comprehensive evaluation of computational thinking to the results of the confirmatory factor analysis, five computa-ing interventions. Computational thinking education (2019), 79–98. tional thinking components were measured satisfactorily, whereas [19] Jeannette M Wing. 2006. Computational thinking. Commun. ACM 49, 3 (2006), two were below the accepted level. In addition, lower than antic-33–35. ipated average variance extracted indicated potential issues with the measurement model, such as weak indicators. To address these concerns, we plan to further investigate Dr. Scratch scale in the future, using the same dataset. Exploratory factor analysis could be a valuable starting point for potential scale refinement. In addition, it would be beneficial to conduct the same analyses on Scratch projects submitted after 2017 and compare the results. ACKNOWLEDGMENTS The authors would like to acknowledge Marcos Román-González and Gregorio Robles, the creators of Dr. Scratch, for their valuable suggestions to improve this work. REFERENCES [1] Efthimia Aivaloglou, Felienne Hermans, Jesús Moreno-León, and Gregorio Robles. 2017. A dataset of scratch programs: scraped, shaped and scored. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 511–514. [2] Bryce Boe, Charlotte Hill, Michelle Len, Greg Dreschler, Phillip Conrad, and Diana Franklin. 2013. Hairball: Lint-inspired static analysis of scratch projects. In Proceeding of the 44th ACM technical symposium on Computer science education. 215–220. [3] Kenneth A Bollen. 1989. Structural equations with latent variables. John Wiley & Sons. [4] Kenneth A Bollen. 2014. Structural equations with latent variables. John Wiley & Sons. [5] Bostjan Bubnic and Tomaz Kosar. 2019. Towards a Consensus about Computational Thinking Skills: Identifying Agreed Relevant Dimensions.. In PPIG. 69–83. [6] Claes Fornell and David F Larcker. 1981. Evaluating structural equation models with unobservable variables and measurement error. Journal of marketing research 18, 1 (1981), 39–50. [7] J Hair, B Black, B Babin, and R Anderson. 2010. Multivariate data analysis, 7th Edition. Pearson Prentice Hall. [8] Litze Hu and Peter M Bentler. 1999. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural equation modeling: a multidisciplinary journal 6, 1 (1999), 1–55. [9] Filiz Kalelioglu, Yasemin Gülbahar, and Volkan Kukul. 2016. A framework for computational thinking based on a systematic research review. Baltic Journal of Modern Computing 4, 3 (2016), 583. [10] Michael Lodi and Simone Martini. 2021. Computational thinking, between Papert and Wing. Science & education 30, 4 (2021), 883–908. [11] Joseph A Lyon and Alejandra J. Magana. 2020. Computational thinking in higher education: A review of the literature. Computer Applications in Engineering Education 28, 5 (2020), 1174–1189. 52 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Machine Learning Approaches to Forecasting the Winner of the 2024 NBA Championship Hana Zadravec hana.zadravec@student.um.si University of Maribor, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia ABSTRACT NBA games effectively. Hence, the increasing integration of ma-Forecasting the winner of the NBA Championship has become chine learning models in sports represents a pivotal and adaptable more important as there is a large amount of data and the league’s strategy moving forward. popularity is increasing. This research investigates techniques in The research motivation comes from the necessity to improve machine learning to predict the winner of the 2024 NBA Champi-sports analytics in the NBA, aiming for more precise predictions onship. Three methods - random forest regression, SVR, and linear to benefit strategic decisions and operations in the betting sector. regression - are used and assessed. The process includes scrap-Due to the constraints of current models that often overlook iming data from Basketball Reference, then analyzing and selecting portant metrics, this research aims to enhance prediction accuracy features. Findings show the leading projected teams for 2024 ac-by utilizing different machine-learning techniques. This study also cording to each model, with random forest regression showing the seeks to address a deficiency in the literature, as it seldom focuses best prediction. Analysis of feature importance emphasizes critical on predicting the champion of the entire championship. predictors like team quality rating and player performance metrics. This article examines forecasting NBA championship winners by The research highlights the capabilities of machine learning in pre-utilizing three machine learning techniques: random forest, support dicting sports outcomes and indicates areas for additional research vector regression (SVR), and linear regression. Section 2 will exam-to improve predictions. ine pertinent studies in the field of predicting sports performance, specifically honing in on NBA results. In Section 3, we provide a KEYWORDS comprehensive explanation of the techniques utilized for gathering and examining data, as well as the implementation of the specified forecasting, basketball prediction, statistical analysis, NBA Cham-models. Next, we will discuss the results and evaluate how well pionship, machine learning each technique worked in Section 4. Section 5 explores the importance of our discoveries for future research in this area. Finally, our analysis leads to conclusions in Section 6. 1 INTRODUCTION Forecasting the winner of the NBA championship has become increasingly accessible for sports analysts, bettors, and enthusiasts alike. This endeavor prompts the exploration and application of 2 LITERATURE REVIEW sophisticated analytical methodologies to enhance predictive preci-Advancements in predictive modeling for basketball are increas-sion. The necessity for more precise prognostications is underscored ingly important within sports analytics. With the growing inte-by recognizing the NBA’s status as the most extensively followed gration of machine learning in this field, researchers continue to professional sports league in 2022, engaging 2.49 billion individuals explore strategies for improving predictions. This section reviews [4]. Comprising 30 teams in North America, the NBA stands as studies focused on forecasting basketball outcomes using various a premier basketball league showcasing elite players globally [5]. machine-learning techniques. With an annual revenue surpassing $10 billion, the league continu-Yongjun et al. [3] propose using data analysis to forecast NBA ously accumulates a wealth of data crucial for analysts and strategic team performance by combining statistical regression methods to planning within sports organizations seeking competitive advan-predict the relationship between game results and winning chances. tages through data analysis. This data often informs pivotal on-field They apply Data Envelopment Analysis (DEA) to identify optimal decisions regarding team formations and gameplay strategies, such performance standards, evaluating their process with the Golden as offensive or defensive approaches. Such insights can significantly State Warriors, which demonstrated high predictive accuracy. The impact match outcomes. Moreover, this wealth of data facilitates study suggests enhancing predictions by incorporating rival tactics individual game outcome predictions in the realm of NBA contests. and expanding the model for game-level forecasts for each player. Ahead of each match, numerous analysts proffer their forecasts Bunker and Susnjak [1] analyze the use of machine learning for the victor. These predictions are scrutinized by commentators methods for forecasting match outcomes in team sports. They eval-on NBA platforms who provide pre-game analysis. Furthermore, a uate various algorithms, including regression models, decision trees, growing betting industry has arisen around prognosticating NBA and neural networks, assessing their predictive success with past matchups. This sector expands annually with a key emphasis on data and player statistics. The study emphasizes advancements in developing precise models adept at handling pertinent metrics in machine learning that enhance accuracy and the challenges faced in DOI: https://doi.org/10.18690/um.feri.6.2024.12 ISBN 978-961-286-914-4 53 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Hana Zadravec practical applications. It also highlights the importance of data qual- • Encoding categorical variables into numerical format using ity and feature selection in improving sports analytics, providing label encoding, allowing for their inclusion in machine valuable insights for future research. learning models. Yao [8] assesses how well neural networks predict outcomes • Removing duplicates to eliminate redundancy in the dataset compared to traditional regression models using past NBA data. and improve model performance. The results indicate that regression models provide straightforward explanations, while neural networks are better at capturing intricate 3.3 Feature Selection patterns, leading to increased precision. This research highlights To enhance model performance, we addressed multicollinearity how advanced machine learning methods can be utilized in sports by filtering features based on their correlation. Pearson’s correla-analysis to improve performance prediction strategies. tion coefficient was used to assess the linear relationship between The study by Huang and Lin [2] introduces an innovative method features. The coefficient 𝑟 is calculated as: for predicting game results using regression tree models. The writ-ers examine different elements impacting game results, like player cov(𝑋,𝑌) data and team interactions, to create a targeted predictive system 𝑟 = (1) 𝜎𝑋 𝜎𝑌 for the Golden State Warriors. Their research shows that regres-where cov(𝑋,𝑌) is the covariance between variables 𝑋 and 𝑌, sion trees can capture intricate connections in the data, leading to and 𝜎 precise predictions of scores. 𝑋 and 𝜎𝑌 are their standard deviations. Pearson’s 𝑟 ranges from -1 to 1: Thabtah et al. [7] explore the use of machine learning to forecast NBA game outcomes in their research. Numerous learning models, • 𝑟 = 1 indicates a perfect positive linear relationship, such as decision trees, artificial neural networks, and Naive Bayes, • 𝑟 = −1 indicates a perfect negative linear relationship, have been applied. After analyzing the data, it was discovered • 𝑟 = 0 indicates no linear relationship. that important characteristics including total rebounds, defensive We set a correlation threshold of 0.9, identifying features with rebounds, three-point percentage, and the quantity of made free an absolute value of Pearson’s 𝑟 above this threshold as highly throws are essential for accurately predicting the outcome of games. correlated. Features exhibiting high correlation were removed to reduce redundancy and mitigate multicollinearity, thus enhancing model interpretability and reliability. 3 METHODOLOGY Additionally, we utilized Principal Component Analysis (PCA) In this section, we provide a comprehensive explanation of the as a dimensionality reduction technique to transform the feature approach utilized for examining NBA data in our research. All space, allowing us to reduce the number of features while retaining analyses were performed using a Jupyter notebook. The primary essential information, further enhancing model performance. objective was to collect, process, and analyze NBA data to develop models for forecasting outcomes for the 2024 NBA season. 3.4 Model Selection and Data Splitting Data was divided into training and testing sets as follows: 3.1 Data Collection • Training Data: Data from the 2005 to 2023 NBA seasons. The initial step involved gathering data through web scraping meth- • Testing Data: Data from the 2024 NBA season. ods. We sourced data from the Basketball Reference website [6], We employed three machine learning models to forecast NBA which offers extensive statistics for every NBA season up to the game outcomes: present day. 3.4.1 Support Vector Regression (SVR). Reason for Selection: Web scraping was chosen for its efficiency in collecting large SVR is selected for its capability to handle complex, non-linear volumes of data without manual input. We used Python libraries relationships between features and the target variable (game out-such as BeautifulSoup and requests to retrieve HTML content from come). SVR is effective in scenarios where interactions between the web pages. The pertinent data, including player stats, team stats, variables are intricate. and game outcomes, were extracted and organized into CSV files Advantages: SVR finds optimal hyperplanes to minimize pre-for ease of manipulation in subsequent analyses. diction errors within a specified margin, capturing subtle patterns in the data. 3.2 Data Preprocessing 3.4.2 Random Forest Regression. Reason for Selection: Random Following data collection, we performed meticulous processing to Forest regression was selected for its ensemble method, combining ensure data quality and consistency. This included: decision trees to improve predictions and manage overfitting. It effectively captures complex interactions in NBA data. • We filled missing NaN values with the median to preserve the dataset and minimize their impact on the analysis. Advantages: This model handles high-dimensional data, identifies key features, and is robust against outliers. It also models • Standardizing the data as needed to maintain uniformity non-linear relationships with minimal tuning, making it versatile across the dataset. for both categorical and continuous variables in sports analytics. • Normalizing features to bring them into a common scale, which is especially important for algorithms sensitive to 3.4.3 Linear Regression. Reason for Selection: Linear Regression the magnitude of input features. is chosen as a baseline model for predicting NBA champions due 54 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Machine Learning Approaches to Forecasting the Winner of the 2024 NBA Championship to its simplicity and interpretability, clarifying how features affect • Linear Regression: The linear regression model uses de-winning likelihood. fault parameters, applying ordinary least squares to esti-Advantages: It offers easy interpretation of feature impacts, mate coefficients without regularization. This simplicity with coefficients showing expected outcome changes for unit shifts allows it to fit the data by minimizing the residual sum of in predictors. Minimal computational resources are needed for quick squares, serving as a benchmark for more complex models. training and evaluation, and it serves as a reference for comparing more complex models. 3.5.2 Evaluation Metrics. Model performance was evaluated using the following metrics: 3.5 Experiment • Mean Absolute Error (MAE): Measures the average magnitude of prediction errors. It is defined as: Our experiment focuses on NBA data from the 2005 season onward, due to significant changes in gameplay and statistical tracking. Prior 1 𝑛 ∑︁ to 2005, the game was more physical and lacked modern statistics MAE = |𝑦 𝑛 𝑖 − ˆ 𝑦𝑖 | (2) like three-point shooting ( 𝑖=1 3P%), which were introduced post-1990. We used data from the 2005 to 2023 NBA seasons to train our where 𝑦𝑖 represents the actual value, ˆ𝑦𝑖 represents the pre-models, which included 49 features related to team and player dicted value, and 𝑛 is the number of observations. performance. Some of the features are: • Mean Squared Error (MSE): Measures the average squared difference between predicted and actual values. It is defined • pre_season_odds: The odds assigned to each team before as: the season starts, indicating their chances of winning the 1 𝑛 ∑︁ championship. MSE = (𝑦 𝑛 𝑖 − ˆ 𝑦𝑖)2 (3) • team_rating_custom: A custom rating for each team based 𝑖=1 on various performance metrics, reflecting their overall where 𝑦𝑖 represents the actual value, ˆ𝑦𝑖 represents the pre-strength. dicted value, and 𝑛 is the number of observations. • FG%: Field Goal Percentage, representing the ratio of field goals made to field goals attempted, a key indicator of shoot-4 RESULTS ing efficiency. The results section provides a comparison of the models based on • 3P%: Three-Point Percentage, indicating the ratio of three-MAE and MSE metrics. Table and figure illustrate the performance point field goals made to three-point attempts, measuring of each model and highlight predictions for the top teams in the a team’s effectiveness from beyond the arc. 2024 NBA season. • max_player_rating_custom: A custom rating for the highest-rated player on each team, capturing the impact of star players. 4.1 Model Comparison Figure 1 presents the comparison of MSE and MAE across the three The target variable for prediction is champion_share, which models used in our study. The Random Forest model achieved the represents the chances of a team winning the NBA Championship in lowest MSE and MAE, indicating superior performance in predict-the 2024 season. This continuous variable can take values between ing NBA outcomes compared to SVR and Logistic Regression. 0 and 1, where a higher value indicates a greater probability of a team being crowned champion. In the dataset, known winners from previous seasons are marked with a value of 1.0, signifying their championship status, while teams that did not win are represented by lower values closer to 0. 3.5.1 Model Parameters. Default parameters were used for all models: • SVR: The Support Vector Regression (SVR) employs a Radial Basis Function (RBF) kernel, which is effective for capturing non-linear relationships. The regularization parameter 𝐶 = 1 controls the trade-off between achieving a low training error and a low testing error, while gamma 𝛾 = 0.1 determines the influence of individual training examples on the decision boundary. • Random Forest: This model consists of 100 trees with no maximum depth specified, allowing each tree to grow fully. This approach enhances model accuracy by averag-Figure 1: Comparison of MSE and MAE for each model. ing predictions from multiple trees, reducing the risk of overfitting. 55 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Hana Zadravec 4.2 Model Predictions complex methods might be less interpretable. Despite its limita-Table 1 presents a summary of the predicted top teams for the 2024 tions, Linear Regression contributed to a broader understanding of NBA Championship as determined by each model, along with their potential championship winners. corresponding champion_share values. These values indicate the estimated probability, expressed as a decimal, of each team winning 5.4 Real-World Outcome the NBA Championship in the 2024 season. At the end of the 2024 season, the Boston Celtics were confirmed as the champions, validating the Random Forest model’s predicTable 1: Top 3 Predicted Teams for 2024 NBA Championship tion and partially supporting the SVR model’s forecasts. Despite high values assigned to the Milwaukee Bucks by the SVR model, Model Top Predicted Predicted their performance was hindered by significant issues such as player Teams champion_share injuries and a coaching change, which affected their final stand-SVR Milwaukee Bucks 0.8691 ing. The Minnesota Timberwolves, who were also predicted to be Boston Celtics 0.7510 in contention, remained competitive until the end of the season, Denver Nuggets 0.6726 demonstrating that our models were accurate in predicting some Random Forest Boston Celtics 0.6485 outcomes. Milwaukee Bucks 0.5643 Minnesota Timber- 0.5527 wolves 6 CONCLUSION Linear Regression Denver Nuggets 0.6706 This research assessed various machine learning techniques in fore-Milwaukee Bucks 0.6448 casting the 2024 NBA season such as SVR, Linear Regression, and Boston Celtics 0.6227 Random Forest models. The Random Forest model outperformed others, showing its capability to deal with intricate feature relationships by achieving the lowest MSE and MAE. Even though 5 DISCUSSION SVR and Logistic Regression were not as accurate, they still offered important information on team performance, highlighting the dif-In this study, we assessed the performance of three predictive mod-ficulties encountered by the Milwaukee Bucks because of injuries els—Random Forest, SVR, and Linear Regression—regarding the and coaching adjustments. This study highlights the significance NBA Championship outcome for the 2024 season. The results reveal of forecasting the whole season instead of single games. several insights and distinctions between these models. One noteworthy aspect of this study is the emphasis on making predictions for the entire season, as opposed to just individual 5.1 Random Forest Model games. Our results indicate the importance of integrating current The Random Forest model achieved the lowest MSE and MAE in data and regularly updating it to enhance the accuracy of predic-predicting the NBA Championship outcome. Its ability to capture tions. Future research should aim to integrate real-time data with complex feature interactions enabled it to identify key determinants advanced modeling techniques to more effectively adapt to the of the championship. Notably, it predicted the Boston Celtics as dynamic conditions and changes that occur throughout the NBA the 2024 season winners, aligning with the actual outcome and season. validating its predictive performance. In summary, incorporating various machine learning models and adjusting predictions with real-time data can improve the precision 5.2 Support Vector Regression of sports predictions. Although the SVR model did not perform as well as the Random Forest and Logistic Regression models in terms of predictive metrics, it REFERENCES was effective in revealing intricate relationships between features. [1] Rory Bunker and Teo Sušnjak. The application of machine learning techniques for predicting match results in team sport: A review. Journal of Artificial Intelligence The SVR model assigned high predicted probabilities to both the Research, 75:1–22, 2022. Milwaukee Bucks and Boston Celtics, reflecting their strong perfor- [2] Mei-Ling Huang and Yi-Jung Lin. Regression tree model for predicting game scores mances throughout the season. However, the actual season outcome for the golden state warriors in the national basketball association. Symmetry, 12(5):835, 2020. highlighted significant challenges for the Milwaukee Bucks, such [3] Y. Li, L. Wang, and F. Li. A data-driven prediction approach for sports team as injuries to key players and a mid-season coaching change. These performance and its application to national basketball association. Omega, page factors likely impacted their final standing, demonstrating that 102123, 2019. [4] National Basketball Association. About the nba. https://www.nba.com/news/ while SVR provided valuable insights, it may not fully account for about, 2024. Accessed: 2024-04-30. unforeseen disruptions and their effects. [5] PlayToday. Nba viewership statistics. https://playtoday.co/blog/stats/nba-viewership-statistics/, 2024. Accessed: 2024-04-30. [6] Basketball Reference. Basketball reference. https://www.basketball-reference. 5.3 Linear Regression com/, 2024. Accessed: 2024-04-30. [7] Fadi Thabtah, Ling Zhang, and Nadia Abdelhamid. Nba game result prediction Linear Regression, while not as effective as the Random Forest using feature analysis and machine learning. Annals of Data Science, 6(1):103–116, model in predictive metrics, still provided valuable insights. The 2019. [8] Alan Yao. Comparing neural and regression models to predict nba team records. model’s predictions for the Boston Celtics and Denver Nuggets as Frontiers in Artificial Intelligence and Applications, 320:421–428, 2019. strong contenders aligned with the final outcome of the championship. This highlights the model’s utility in scenarios where more 56 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Hit Song Prediction Through Machine Learning and Spotify Data Andrej Natev 89221050@student.upr.si University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies, Koper, Slovenia ABSTRACT This study predicts hit songs using metadata from the Spotify API[8]. The dataset includes over 20 genres, each with 40 songs, equally divided between hits and flops, gathered using spotipy[7]. Prediction is based on the popularity feature, rated from 0-100. Models were trained on features like danceability, energy, loudness, speechiness, valence, and tempo. The dataset was split using train_test_split (10%, 20%, 33%) and kfold cross-validation with k values of 2, 5, and 10. Models were trained, evaluated, and tested, with kfold cross-validation showing the best accuracy and the least overfitting. Scikit-learn’s classifiers, ensemble models, and MLPClassifier were used, with PassiveAggressiveClassifier and AdaBoost showing 60% accuracy. Ensemble methods like extra trees and random forest, along with neural networks, performed well. Gaussian Process, Naive Bayes, and ridge classifiers stood out among standard models. These results suggest that enhanced models, especially neural networks and decision tree ensembles, could improve hit prediction. Future work may explore frequency and lyric analysis. KEYWORDS music, genre, song, Spotify, machine learning, classification, ensemble model, support vector, neural networ, artificial intelligence Figure 1: Box Plot for Feature Outliers 1 INTRODUCTION This research delves into the intersection of music and data science, facilitated feature engineering, preprocessing, data splitting, model leveraging the Spotify Web API[8] in conjunction with the Spotipy implementation, and evaluation. And also MatPlotLib was used to library[7], and machine learning models. By harnessing these tools, be able to visually analyze the features and to present the results. the study[2] aims to extract and analyze track data across various genres. The primary objective is to find machine learning models 2.2 Most Accurate Models capable of categorizing songs into two distinct groups: "hit" and 2.2.1 MLPClassifier with ReLU and Logistic Activation Functions. "flop", based on a range of audio features. Popularity is a feature in The MLPClassifier is a neural network model with multiple layers Spotify’s Web API[8], that represents a song’s popularity the past of interconnected nodes. It uses the ReLU (Rectified Linear Unit) three days from the day of extraction. It is an integer that ranges activation function, which outputs the input directly if positive or from 0-100, such that a flop is any song below 60 popularty and zero otherwise, helping to avoid the vanishing gradient problem. everything above is a hit song. The logistic (sigmoid) activation function, which maps the input into a range between 0 and 1, is particularly useful for binary classi-2 MATERIALS, MODELS, METHODS fication tasks. These activation functions enable the MLPClassifier 2.1 Materials to capture complex data patterns, improving prediction accuracy.[3] In this study[2], firslt two datasets from Kaggle were utilized: "Most 2.2.2 ExtraTreesClassifier with Gini and Entropy Criteria. The Ex-Streamed Songs 2023"[6] and "30000 Spotify Songs"[5]. These datasets traTreesClassifier is an ensemble method that builds many decision provided a rich source of music metadata for analysis, with one of trees using randomized splits of the training data. It uses Gini im-them being a training dataset and the other a evaluating dataset. purity or entropy as criteria to evaluate the quality of splits within Later, both of them were discarded because of the chance of over-the trees. Gini measures the likelihood of misclassification, while fitting. So then Spotipy[7], a Python library, was used for data entropy measures uncertainty. By averaging predictions from mul-extraction from the Spotify Web API[8]. Pandas and NumPy were tiple trees, ExtraTreesClassifier reduces overfitting and improves employed for data manipulation, while the Scikit-learn[3] library generalization, making it a robust choice for classification tasks.[3] DOI: https://doi.org/10.18690/um.feri.6.2024.13 ISBN 978-961-286-914-4 57 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Andrej Natev Figure 2: Bar Graph for TTS Accuracy Percentages 2.2.3 GradientBoostingClassifier. GradientBoostingClassifier in-data, LogisticRegression effectively handles cases where the feature-crementally builds a strong classifier by sequentially adding weak target relationship is approximately linear, making it widely used learners, typically decision trees. Each new model is trained on the for various classification scenarios.[3] residual errors of the previous models, allowing the ensemble to focus on earlier mistakes. This process iteratively reduces error, 2.2.8 SVC (Linear Kernel). SVC with a linear kernel is a supervised enhancing accuracy and robustness. GradientBoostingClassifier is learning model that constructs a hyperplane in a high-dimensional particularly effective in complex prediction tasks requiring high space to separate classes. The linear kernel computes the dot prod-precision.[3] uct of feature vectors, making it effective for linearly separable data. By maximizing the margin between classes, SVC with a linear 2.2.4 PassiveAggressiveClassifier. PassiveAggressiveClassifier is a kernel provides reliable and interpretable classification results.[3] linear model suited for large-scale learning tasks, especially in online learning. It updates parameters only when misclassification 2.3 Methods occurs (passive) and aggressively corrects errors when they do. This combination allows the model to adapt quickly, making it efficient 2.3.1 Data Understanding. The first step in our methodology infor real-time classification tasks where speed is crucial.[3] volved understanding the datasets and the task at hand. Initially, songs were obtained from the Spotify Web API[8] based on their 2.2.5 AdaBoostClassifier. AdaBoostClassifier is an ensemble method popularity ratings, ensuring a balanced representation of hits and that builds a strong classifier by combining multiple weak classifiers. flops. The popularity feature served as a crucial criterion for cat-It adjusts the weights of misclassified instances in each iteration, egorizing songs as hits or flops, with hits defined as songs with focusing subsequent classifiers on difficult cases. By concentrating a popularity score of 60 or above, and non-hits as songs with a on previous errors, AdaBoostClassifier progressively improves ac-popularity score below 60. curacy, making it a powerful tool for various classification tasks.[3] 2.3.2 Data Extraction. Data extraction was conducted using the 2.2.6 RidgeClassifier. RidgeClassifier is a linear model that uses Spotipy library along with the Spotipy-random add-on. The extrac-L2 regularization to prevent overfitting by penalizing the magni-tion process involved sourcing track data directly from the Spotify tude of coefficients. This regularization is particularly useful in Web API[8]. In addition to leveraging Spotipy-random, genres were high-dimensional spaces where features outnumber observations. carefully selected to ensure diversity and fairness in the broad RidgeClassifier balances bias and variance, making it effective for range of the features’ values. These genres ranged from unpopular classification tasks requiring generalization to unseen data.[3] to popular and encompassed different audio features to ensure dis-tributivity across the dataset. 2.2.7 LogisticRegression. LogisticRegression is a linear model for During the extraction process, it was observed that songs with pop-binary classification tasks. It models the probability of class mem-ularity ratings above 85 and below 60 were particularly challenging bership by applying the logistic function to a linear combination of to find, even when considering genres that were both unpopular input features. Trained by maximizing the likelihood of observed and popular. This highlights the inherent difficulty in obtaining 58 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Hit Song Prediction Through Machine Learning and Spotify Data Figure 3: Bar Graph for CV Accuracy Percentages a balanced dataset, especially when targeting specific popularity and Logistic Regression that demonstrated stable performance on ranges. the same test size. The MLP (logistic activation function) and the SupportVector Classifier demonstrated the highest nad the most 2.3.3 Data Preparation. Following data acquisition, the dataset constant through out all test sizes, 10%, 20% and 30%. was prepared for analysis by incorporating audio features obtained Regarding the cross-validation techniques, the stratified kfold had from the Spotify API[8]. These features included danceability, en-a constant lower accuracy throughout all models and throughout ergy, key, loudness, mode, speechiness, acousticness, instrumental-all kfolds, 2, 3, 5, and 10. ExtraTrees Classifier had the highest ness, liveness, valence, and tempo. Additionally, data types were accuracy with both techniques and using both the gini criterion adjusted to ensure categorical representation for key, mode, time and the entropy criterion, with an an almost 60%. It was followed signature, and hit categories. After which another dataset was made by the GradientProcess Classifier that had a percent higher accu-that contained the same features but just scaled using the default racy than similarly named the GradientBoosting Classifier. Other StandardScaler from scikit-learn[3]. notable mentions using the cross-validation techniques are the Pas-2.3.4 Model Training and Evaluation. During the model training sive Agressive Classifier, and the MLP Classifier with the ReLu and evaluation phase, it was observed that KFold and StratifiedK-activation function with a very similar accuracy. Fold cross-validation techniques[3] encountered difficulties when handling larger splits due to the relatively small size of the dataset. With only 1299 songs available and an almost 50/50 balance between hits and non-hits, the dataset size posed challenges for these cross-validation methods to effectively cover all instances Despite these challenges, various machine learning models, including ensemble 4 DISCUSSION models and standard classifiers, were trained on the dataset. The The findings of this study[2] shed light on the possibility of ma-performance of each model was evaluated using Train-Test-Split[3] chine learning algorithms to be used to predict hit songs based (TTS) with varying test sizes (10%, 20%, and 33%), as well as KFold on Spotify metadata[8]. While some models exhibited promising and StratifiedKFold cross-validation techniques with different fold accuracy rates for a really general approach to the problem, devia-splits (2, 3, 5, and 10). tions from expected outcomes were observed, prompting deeper analysis. The AdaBoost Classifier achieved the highest accuracy, 3 RESULTS but only in the train-test-split. Additionally, the MLPClassifier with The study[2] aimed to predict hit songs using machine learning identity and logistic activation functions showed accuracy, sug-algorithms trained on Spotify API metadata. Results revealed vary-gesting the potential of neural network architectures in capturing ing accuracies across different models and evaluation techniques. nonlinear relationships within the data. Theoretical implications Notably, the AdaBoost Classifier and Passive Agressive Classifier suggest the need for further investigation into ensemble models and achieved the highest accuracy of 60% on a test size of 33%, followed neural networks, and hyperparameter tuning to optimize model by the RandomForest (entropy criterion), and Ridge Classifiers, performance. 59 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Andrej Natev Table 1: Model Accuracy Comparison Model Accuracy (%) Train-Test-Split k-fold CV RandomForestClassifier (Entropy) 57.69 / 54.62 / 52.68 56.12 AdaBoostClassifier 60.00 / 55.38 / 52.91 53.58 ExtraTreesClassifier 48.46 / 51.54 / 55.24 56.81 ExtraTreesClassifier (Entropy) 54.62 / 55.38 / 54.31 57.81 MLPClassifier (Identity) 50.00 / 58.46 / 53.85 55.66 MLPClassifier (Logistic) 58.46 / 58.46 / 54.78 55.50 4.1 Previous Research [5] Joakim Arvidsson. 2023. 30000 Spotify Songs dataset. https://www.kaggle.com/ Compared to prior research endeavors, which often grappled with datasets/joebeachcapital/30000-spotify-songs [6] Nidula Elgiriyrwithana. 2023. Most Streamed Songs 2023 dataset. https://www. issues of data imbalance and feature scaling, this study’s results rep-kaggle.com/datasets/nelgiriyewithana/top-spotify-songs-2023 resent a significant improvement. The utilization of a more balanced [7] Paul Lamere. 2014. Spotipy Documentation. https://spotipy.readthedocs.io/en/2. 24.0/ dataset, coupled with standardized feature scales, has led to more [8] Spotify. 2013. Spotify Web API Documentation. https://developer.spotify.com/ reliable and interpretable models. The transition from overfitted documentation/web-api models, which yielded inflated accuracy rates, to robust and generalizable models underscores the importance of methodological rigor in data science research.[1] 5 CONCLUSION In summary, this study[2] investigated the application of machine learning algorithms for hit song prediction using Spotify metadata[8]. Through rigorous experimentation and evaluation, we have demonstrated the potential of various classifiers and ensemble methods in categorizing songs into hits and non-hits with reasonable accuracy for a general approach. The findings contribute to the existing body of research by providing insights into the performance characteristics of different models and the impact of algorithmic parameters on predictive outcomes. Despite achieving competitive accuracy rates, the study[2] also revealed nuances and deviations from expected results, making an even bigger need for further investigation. Moving forward, it is imperative to address open questions surrounding the generalizability of models across diverse music genres, the robustness of predictions over time, and the incorporation of additional features such as lyrics and user-specific preferences. Moreover, future research should focus on refining model architectures, exploring ensemble models and neural networks, and optimizing hyperparameters to enhance predictive efficacy. ACKNOWLEDGMENTS Special recognition is also extended to the Google Developer Student Club of the University of Primorska for organizing multiple events focused on ML&AI. It was during these events that the seeds of the initial research, which had overfitting issues, were sown.[4] REFERENCES [1] Andrej Natev. 2023. Initial Notebook. https://www.kaggle.com/code/andrejnatev/ hit-song-prediction [2] Andrej Natev. 2024. Main Notebook. https://www.kaggle.com/code/andrejnatev/spotify-api-spotipy-hit-song-prediction. [3] David Cournapeau. 2007. Scikit-learn Documentation. https://scikit-learn.org/ stable/index.html [4] GoogleDSC University of Primorska. 2023. ML&AI Summit. https://gdsc. community.dev/university-of-primorska-koper-slovenia/ 60 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 A Data-Driven Approach for the Analysis of Ridership Fluctuations in Transit Systems Jovan Pavlović Miklós Krész jovan.pavlovic@famnit.upr.si miklos.kresz@innorenew.eu University of Primorska, InnoRenew CoE, FAMNIT, Izola, Slovenia Koper, Slovenia László Hajdu András Bóta laszlo.hajdu@famnit.upr.si andras.bota@ltu.se University of Primorska, Luleå University of Technology, FAMNIT, Luleå, Sweden Koper, Slovenia ABSTRACT This research uses agent-based simulations to analyze passenger This study focuses on identifying critical components within urban interactions within transit systems. While networks traditionally public transportation networks, particularly in the context of fluc-depict routes and stops, improved data collection now allows track-tuating demand and potential pandemic scenarios. By employing ing individual passenger interactions. Smart card data [12] and advanced agent-based simulations, we analyzed passenger interac-activity-based travel models [10, 11] capture detailed passenger tions and ridership patterns across the San Francisco Bay Area’s contact patterns. However, creating accurate real-world contact transit system. Key findings reveal specific transit stops and routes networks from this data poses challenges, including computational that are highly sensitive to changes in demand, often serving as complexity and privacy issues [4, 5]. bottlenecks or high-risk areas for the spread of infectious diseases. We used activity-based travel demand models to simulate probable traveler paths in transit networks, considering demand, supply, KEYWORDS and service details. These models are complemented by schedule-based transit assignment models, which provide accurate estimates network modeling, public transportation, agent-based simulation, of travel time and waiting times. We analyzed the outputs of tran-community detection, demand fluctuations sit assignments, considering transit route usage, congestion, and waiting times at transit stops, to identify critical components of 1 INTRODUCTION the transit network that could be potentially affected by changes Efficient public transportation is vital for urban mobility, economic in transit demand. Additionally, we processed this data to generate productivity, and public health. During the COVID-19 pandemic, contact networks. We then applied a modularity-based community transit systems worldwide were dramatically affected, resulting in a detection algorithm to extract non-overlapping communities of significant decline in ridership due to lockdowns, social distancing passengers from the contact network and used these communities measures, and the shift to remote work [1, 6]. Physical distancing, a to further analyze critical bus routes used by different communities. widely used non-pharmaceutical intervention to prevent the spread of the virus, further reduced the capacity of public transportation 2 BACKGROUND services, limiting their ability to meet demand [13]. This work is inspired by the methodologies used in previous stud-Factors such as population growth, economic conditions, and ies [2, 3, 7]. However, rather than explicitly modeling the spread environmental policies can also cause fluctuations in public trans-of disease to identify high-risk transit components, it focuses on portation usage. Understanding these changes is crucial for plan-examining the components most likely to be affected by changes ning resilient and efficient transit systems that can adapt to the in ridership trends due to a pandemic or other scenarios. evolving needs of cities. The contribution of this work is to develop a framework that iden-Adjusting service frequency during peak and off-peak hours al-tifies critical components in terms of factors like changes in transit lows for more efficient use of resources and helps maintain service demand, vehicle capacities, and transit schedules. The insights de-levels that meet demand without overloading the system. Addition-rived from this framework can be further utilized for modeling ally, rerouting or introducing new transit lines in underserved areas transit operations in these scenarios. can improve accessibility and attract more users, or conversely, in the case of a pandemic, these changes can discourage usage to help 3 METHODOLOGY manage public health risks. Infrastructure updates, such as upgrad-ing stations for better crowd flow can also help transit systems 3.1 Transit simulation model adapt to changes. Therefore, it’s important to identify the parts of We used a schedule-based transit assignment model, FAST-TrIPs [8], the transit system that are most affected by changes in ridership to to simulate passenger movement within the transit network. This develop these strategies effectively. model’s time-dependent structure captures daily service variability DOI: https://doi.org/10.18690/um.feri.6.2024.14 ISBN 978-961-286-914-4 61 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Jovan Pavlović, Miklós Krész, László Hajdu, and András Bóta and focuses on specific transit vehicle trips, which is crucial for 3.4 Community detection algorithm accurately reflecting passengers’ route choices based on the service We used the Clauset-Newman-Moore greedy modularity maximiza-schedule. FAST-TrIPs operates on a transit network composed of tion algorithm [3] to find the community partition of the contact nodes that represent stops. Trips are connected to specific routes network with the highest modularity. This community detection within this network, and transfer links connect nodes where passen-algorithm is a hierarchical agglomeration method designed to effi-gers can change vehicles. This setup allows for precise modeling of ciently identify community structures within large, sparse networks. both vehicle movements and passenger transfers across the transit Unlike traditional methods, which can be computationally expen-network. sive, this algorithm operates in a time complexity of 𝑂(𝑚𝑑 log𝑛), At the heart of FAST-TrIPs is the Transit Hyperpath Algorithm, where 𝑛 is the number of vertices, 𝑚 is the number of edges, and 𝑑 which constructs a subnetwork of probable transit routes and as-is the depth of the dendrogram describing the community structure. signs probabilities to these routes using a logit route choice model. For many real-world networks, which are sparse and hierarchical The algorithm calculates hyperpaths by considering user-preferred (with 𝑚 ∼ 𝑛 and 𝑑 ∼ log𝑛), the algorithm runs in nearly linear arrival times and waiting time windows, allowing for the simulation time, 𝑂(𝑛 log2 𝑛). of passenger journeys with a focus on real-time decision-making and path selection. Passenger movement is then modeled using a 3.5 Limitations pre-estimated route choice model that incorporates factors such as The primary limitation of this study is the size of the demand data. in-vehicle time, waiting time, walking time (for access, egress, and Although the GTFS data originates from a transit network serving transfers), and transfer penalties. millions daily, computational constraints prevented us from simulat-The transit assignment model generates detailed outputs, such ing real-world demand accurately. Consequently, train routes were as vehicle load profiles and passenger trajectories. The load profile not filled beyond half capacity, making it impossible to realistically provides information on the number of passengers boarding and assess the effects of demand changes on the trains. Additionally, the alighting at each stop, along with timestamps, offering insight dataset contains outdated transit system and demand information. into passenger counts throughout the route. Passenger trajectories However, the proposed method serves as a proof of concept and document each passenger’s activities, including stop and vehicle IDs can be directly applied to more comprehensive travel datasets. with timestamps, enabling the modeling of interactions between Another limitation is the lack of a detailed comparative study passengers. with state-of-the-art methodologies that aim to achieve similar 3.2 Input data objectives. This choice was due to space constraints, but future research will expand on this comparison, with findings to be pub-FAST-TrIPs requires various input files, including transit system lished in a full-length journal paper. data stored in GTFS-PLUS format, and transit demand data that contains information about the trips individual passengers make 4 RESULTS throughout the day, including trip origins, destinations, and preferred arrival times. Additionally, path weights associated with 4.1 Model outputs and contact network in-vehicle time, waiting time, walking time, and transfer penalties Due to computational limits, the simulations used a reduced number must be specified as input. of iterations to reassign passengers to alternative routes. Despite In the current study, we used GTFS-PLUS data 1 from the San this, most passengers (41,845 out of 44,912) successfully reached Francisco Bay Area in California from 2017, which includes 854 their destinations, resulting in 83,280 completed trips. routes (covering bus, heavy rail, light rail, and ferry routes) and Figure 1 presents a boxplot of average waiting times, aggregated 36,058 trips serving 6,181 stops over a 24-hour weekday. On the by passenger, transit route, and transit stop. demand side, we used data generated in the same year using the SF-CHAMP travel forecasting tool. Since calibrated path weights were not available for the Bay Area network, we borrowed path weights from a previous study [9] corresponding to the Austin, Texas region. 3.3 Contact network As mentioned previously, FAST-TrIPs outputs detailed passenger trajectories that can be further processed to produce a contact network. In this network, each passenger traveling within the transit system is considered a node, and edges connect any two passengers who share a vehicle trip for a positive time period. The vehicle trip refers to a specific route with a specific departure time and is unique to a single vehicle. Each edge is associated with three attributes: the contact start time, contact duration (in seconds), and Figure 1: Boxplots of average waiting times the vehicle trip ID The derived contact network consisted of 41,845 passenger nodes 1https://mtcdrive.app.box.com/s/3i3sjbzpsrbhxlwpl4v4vx9b0movferz and 3,530,995 contact links. The density plot of contact start times, 62 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 A Data-Driven Approach for the Analysis of Ridership Fluctuations in Transit Systems displayed in Figure 2, peaked at 7 AM and 5 PM, reflecting typical other stops in the transit network. The idea is that such stops weekday commutes. The average contact duration was 18 minutes could become critical in scenarios where transit demand increases, and 43 seconds. Figure 4 shows the density plot of contact durations. potentially turning them into bottlenecks. Additionally, in epidemi-ological situations, passengers waiting at these stops might face an increased risk of infection. To focus on the most relevant stops, we filtered out those serving fewer than 100 people and sorted the remaining stops based on average waiting time. Table 1 provides information on the 10 stops with the longest average waiting times. Most of the listed stops are served by multiple bus routes and have between 100 and 200 passengers waiting at them throughout the day.Inordertoidentifycriticaltransitrouteswetooktwoapproaches. Firstly we identified routes whose vehicle trips are on average most congest. Due to limited transit demand here we focused only on bus routes. Table 2 shows 10 most critical bus routes identified in this way. Figure 2: Density plot of contact stat times The second approach involved identifying critical trips with respect to the community structure of the contact network. Commu-The degree distribution of the contact network, shown in Figure 3, nity detection algorithm divided the network into 627 communities, indicates an average of 134 contacts per person, with a maximum with the largest 10 containing 37% of all passengers in the network. of 1,011, following a skewed power law distribution. We then identified transit routes used by passengers who appear in at least two of these ten communities and ranked the routes by the number of communities whose passengers travel on them. Table 3 shows the ten most critical routes identified in this way. As can be observed, all of the identified bus and trolleybus routes belong to the San Francisco Municipal Railway (SF Muni) system operating in San Francisco. Figures 5 and 6 summarize the obtained results. Critical stops are marked in red, bus routes used by multiple communities are colored in green, and the most congested bus routes are marked in blue. As observed, the majority of the most congested bus routes connect different cities within the Bay Area or link various cities to Figure 3: Degree distribution San Francisco. For example, several of these routes travel between Contra Costa and Alameda counties, as well as between San Mateo and Alameda counties. Additionally, some routes connect Berkeley and San Francisco, while many others link San Mateo County, Santa Clara County, Marin County, and Petaluma in Sonoma County to San Francisco. Most of the critical bus stops are concentrated in San Francisco, with several others located in the centers of various cities in the Bay Area, including Berkeley, Oakland, San Jose, and Palo Alto. As previously noted, the bus and trolleybus routes connecting different communities that commute in the Bay Area belong to the SF Muni system operating within San Francisco. 5 CONCLUSION Using agent-based simulations and network analysis techniques, we identified transit stops and routes that are most vulnerable to Figure 4: Density plot of contact duration changes in demand, whether due to a pandemic or other social and economic factors. Our findings show the importance of focusing on crowded routes and stops with long wait times, as these are likely 4.2 Identifying critical components to become bottlenecks when demand increases. The application of We first aimed to identify transit stops that may be sensitive to community detection to passenger contact networks further reveals changes in demand. These stops are characterized by two key prop-how interconnected different transit routes are within major urban erties: they serve sufficiently large groups of people, and the av-areas, emphasizing the importance of certain routes in keeping erage waiting times at these stops are longer than those at most public transportation running smoothly. 63 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Jovan Pavlović, Miklós Krész, László Hajdu, and András Bóta Table 3: Bus routes used by multiple communities Route ID Community appearance count sf_muni_49 8 sf_muni_27 7 sf_muni_1 6 sf_muni_14 5 sf_muni_19 5 sf_muni_22 5 sf_muni_10 5 sf_muni_38_33RD 5 sf_muni_2 5 sf_muni_5 5 Figure 5: Critical components of San Francisco Bay Area transit system Additionally, during an epidemic outbreak, crowded routes and stops with long wait times can significantly contribute to the spread of infectious diseases. Areas where many passengers gather for extended periods are likely to become hotspots for infection transmission. Moreover, the interconnected nature of certain transit routes means that an outbreak starting in one part of the network could quickly spread to other areas, especially if key routes are involved. This highlights the need to closely monitor and manage these critical parts of the system to reduce the risk of widespread transmission. Implementing measures such as increasing service frequency, rerouting buses, or enhancing cleaning protocols at these critical points could be crucial in controlling the spread of diseases. These insights can also inform more effective public health Figure 6: SF Muni transit routes strategies, such as prioritizing vaccination or testing efforts in areas served by these high-risk routes and stops. Table 1: Critical bus stops REFERENCES Stop ID Average waiting time Number of serving routes [1] Apple Inc. 2020, 2021, 2022. Apple Mobility Trends Reports. [2] András Bóta, Lauren Gardner, and Alireza Khani. 2017. Identifying Critical 6420 18m 23s 5 Components of a Public Transit System for Outbreak Control. Networks and 6051 14m 21s 8 Spatial Economics 17, 4 (2017), 1137–1159. Folsom/8th 12m 8s 29 [3] András Bóta, Lauren Gardner, and Alireza Khani. 2017. Modeling the spread 6336 11m 33s 11 of infection in public transit networks: a decision-support tool for outbreak planning and control. In 103007 11m 4s 12 Transportation Research Board 96th Annual Meeting. [4] Ciro Cattuto et al. 2010. Dynamics of Person-to-Person Interactions from Dis-7186 8m 21s 16 tributed RFID Sensor Networks. PLOS ONE 5 (2010), e11596. 103688 7m 51s 9 [5] Nicholas A. Christakis and James H. Fowler. 2010. Social Network Sensors for 3rd/Mendell/Palou 7m 41s 6 Early Detection of Contagious Outbreaks. PLOS ONE 5 (2010), e12948. [6] Google LLC. 2020, 2021, 2022. Google COVID-19 Community Mobility Reports. 13537 7m 13s 1 Retrieved from https://www.google.com/covid19/mobility/. 103589 6m 34s 4 [7] Laszlo Hajdu, András Bóta, Miklós Krész, et al. 2020. Discovering the Hidden Community Structure of Public Transportation Networks. Networks and Spatial Table 2: Most congested bus routes Economics 20 (2020), 209–231. [8] Alireza Khani. 2013. Models and solution algorithms for transit and intermodal passenger assignment (development of fast-trips model). PhD Dissertation. University of Arizona, Tucson, AZ, USA. Route ID Capacity Average number of pessengers [9] Alireza Khani, Tyler J. Beduhn, Jennifer Duthie, Stephen Boyles, and Ehsan Jafari. samtrans_83D 60 25 2014. A transit route choice model for application in dynamic transit assignment. samtrans_16A 50 25 In Innovations in Travel Modeling. Baltimore, MD. samtrans_292NA 75 20 [10] William H. K. Lam and Hai-Jun Huang. 2003. Combined Activity/Travel Choice Models: Time-Dependent and Dynamic Versions. Networks and Spatial Economics scvta_101 63 19 3, 3 (2003), 323–347. scvta_182 63 18 [11] Matthew J. Roorda, Juan A. Carrasco, and Eric J. Miller. 2009. An integrated model samtrans_397 75 17 of vehicle transactions, activity scheduling and mode choice. Transportation scvta_304 63 17 Research Part B: Methodological 43, 2 (2009), 217–229. [12] Liang Sun, Kay W. Axhausen, Dong H. Lee, and Manuel Cebrian. 2014. Effi-scvta_330 63 17 cient Detection of Contagious Outbreaks in Massive Metropolitan Encounter golden_gate_transit_GG76 63 14 Networks. Scientific Reports 4 (2014), 5099. samtrans_295 75 13 [13] Alejandro Tirachini and Oded Cats. 2020. COVID-19 and Public Transportation: Current Assessment, Prospects, and Research Needs. Journal of Public Transportation 22, 1 (2020), 1–21. Epub 2022 Sep 13. 64 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Automatic Assessment of Bradykinesia in Parkinson’s Disease Using Tapping Videos Matjaž Zupanič Dejan Georgiev Jure Žabkar matjaz.zupanic1@gmail.com dejan.georgiev@kclj.si jure.zabkar@fri.uni-lj.si University of Ljubljana, University of Ljubljana, University of Ljubljana, Faculty of Computer and Faculty of Medicine, Faculty of Computer and Information Science, Ljubljana, Slovenia Information Science, Ljubljana, Slovenia Ljubljana, Slovenia ABSTRACT number of pauses, time taken, decrease in amplitude, and slowing Parkinson’s disease is a chronic neurodegenerative illness that se-of speed, all contributing to the final score. It was estimated that verely affects the everyday life of a patient. The severity of Parkin-up to 25 % of clinical diagnoses of PD are incorrect, due to lack of son’s disease is assessed using the MDS-UPDRS scale. In this study, experience or attention during tests [3]. we explore the feasibility of automatically evaluating bradykinesia, a key symptom of Parkinson’s disease, from tapping videos recorded 2 RELATED WORK on smartphones in everyday settings. We collected a dataset of 183 First automated systems for PD detection were based on wearable tapping videos, from 91 individuals. Videos were assessed by neu-sensors like gyros and accelerometers [5, 17, 20, 22] or on elec-rologist into 5 classes of the MDS-UPDRS scale. For data extraction, tromyography sensors [11, 28]. The main issues with sensors are we employed MediaPipe Hand, which provides a time series of hand that they are commercially unavailable, require precise placement, skeleton movements. The data was preprocessed to eliminate noise and can interfere with tapping test. Therefore, some researchers and subsequently used for either feature construction or directly in have utilized keyboard tapping [1, 19] or tapping on a smartphone neural networks. Utilizing manually created features in a multilayer screen [7, 8] for data acquisition. Sadikov et al. [21] collected data perceptron classifier resulted in 61 % accuracy and an F1 score of using digital spirometry, where participants traced an Archimedes 0.61 on our test set. Employing a fully convolutional network, we spiral on a touchscreen device. Advances in hardware and software improved the accuracy to 78 % and the F1 score to 0.75. Additionally, have made computer vision combined with machine learning a we developed the tool for visualising tapping and displaying key viable alternative for PD recognition, allowing for home testing data, providing detailed insights into tapping patterns. using a computer or smartphone. Lainscsek et al. [13] used a nonlinear delay differential equation, with the structure selected by a KEYWORDS genetic algorithm. While other researchers used machine learning bradykinesia evaluation, finger tapping test, Parkinson’s disease, techniques, most focused on manual feature creation and utilized machine learning, tapping video analysis these features in classification models [10, 18, 29, 30] like support vector machines (SVM) and random forests (RF), or regression mod-1 INTRODUCTION els [9] like support vector regression (SVR) and XGBoost. Others employed neural networks (NN) to automatically learn patterns Parkinson’s disease (PD) is a chronic neurodegenerative condition from time series data [2, 14]. Researchers used segmentation neural that profoundly impacts daily life. PD affects 1-2 % [23] of the popu-networks or optical flow for hand data extraction, with MediaPipe lation over the age of 65. Currently, there are more than 1.2 million Hand [16] becoming popular in later works. Due to limited data, cases in Europe [4] and this number is forecast to double in the most studies combined some classes, and only a few performed near future due to the demographic problem of an aging popula-full-scale classification [2, 9, 14, 30]. tion. Its etiology remains incompletely understood, yet researchers suggest that a combination of genetic and environmental factors 3 METHODOLOGY contributes to its development. Factors such as exposure to polluted air, pesticides, heavy metals, and head injuries have been associated First, we collected and labeled data and preprocessed it to eliminate with an increased risk of Parkinson’s disease. The most common noise. We initially applied Support Vector Machine (SVM), Multi-symptoms include bradykinesia, which is also the main symptom, Layer Perceptron (MLP), and Random Forest (RF) with manual tremor, rigidity, impaired postural reflexes, and dementia. There feature extraction, then used a Fully Convolutional Network (FCN) are also numerous other symptoms that can accompany the disease, directly on the time series. such as sleep disturbances, depression, loss of smell, and fatigue. The standardized MDS-UPDRS [6] scale is used to assess the 3.1 Dataset stage of Parkinson’s disease. It consists of 4 sections that evaluate Since datasets of tapping videos were not publicly accessible, we both motor and non-motor issues experienced by patients. The assembled our own database. From each participant, we collected finger-tapping test is used to evaluate the severity of bradykinesia. two videos: one of the left-hand tapping and another of the right-This test involves asking the individual to tap their index finger and hand, as they are independent from each other and have their own thumb as quickly as possible with a maximal span, assessing the score. Videos were recorded at a resolution of 3840 x 2160 or 1920 DOI: https://doi.org/10.18690/um.feri.6.2024.15 ISBN 978-961-286-914-4 65 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Matjaž Zupanič, Dejan Georgiev, and Jure Žabkar x 1080 at 60 or 30 fps. All PD patients were recorded in a clinical but limited insights into finger tapping, although movements in setting, while some participants of the healthy group were recorded these areas are typically less pronounced. The time series of dis-in various other environments. tances represents the amplitude spectrum, from which we derived 4 additional spectra: velocity, acceleration, frequency, and spectrum Table 1: Number of videos in each class. of amplitude peaks. Additionally, we included the spectrum of absolute wrist movement. From these 6 spectra, we extracted 193 statis-MDS-UPDRS score 0 1 2 3 4 Sum tical features. Our goal was to capture dependencies at global and Number of videos 49 51 53 23 7 183 local levels, describing hesitations, slowdowns, amplitude decreases, Percentage % 26 28 29 13 4 100 data distributions, tapping energy, and other characteristics. In 2nd method we designed features closely aligned with the MDS-UPDRS scale, categorizing them into 3 parts reflecting its We excluded videos with significant tilts, incomplete hand vis-criteria. The first part assesses hesitations and freezes, the second ibility, and participants scoring above 0 on MDS-UPDRS without part measures reduced speed and the third part evaluates decreased confirmed PD. In total, we compiled 183 videos from 91 different in-amplitudes. In our final analysis, we utilized 145 features, with dividuals. The distribution of data between classes can be observed 90 being coefficients from linear regressions to assess reductions in Table 1. As this study involves clinical data, ethical approval in amplitudes and velocities of finger tapping. These coefficients was obtained as part of a larger research project approved by the were derived from local maxima of amplitudes and velocities. The Neurological Clinic. remaining 55 features were derived from amplitudes, velocities, 3.2 Preprocessing their extremes, and autocorrelated velocities and amplitudes. Since datasets were collected without any professional equipment we were dealing with different illuminations, angles, camera tilts, distances between camera and hand, noise, and motion blur. Videos were cut, so only hand is visible, due to privacy and faster processing. We used Mediapipe Hand [16] to extract the thumb and index finger positions and computed the Euclidean distances between their endpoints to generate time series data. Occasionally, there were single-frame misses where Mediapipe detected previous and next frames but missed the current one. We used linear interpolation to fill these gaps, but avoided interpolating longer gaps to preserve tapping integrity. To reduce noise caused by inaccurate detections from MediaPipe, we implemented a combination of low-pass and moving average filter. Filtering also helped to eliminate tremor which is not part of MDS-UPDRS, but masked tapping. We balanced filtering and signal preservation using weaker filters. The Low-pass filter addressed high-frequency noise from tremors and MediaPipe Hand’s misalignment, which caused slight shifts between frames even when the hand was still. We implemented the Butterworth low-pass filter due to its flat response. The cutoff frequency set uniquely for each input based on a specified influence percentage k, as defined by the equation: 𝑓cutoff = 𝑓𝑚𝑎𝑥+bandwidth∗𝑘 2 . Additionally, a moving average with a window size of 5 was used to further smooth the data and reduce erroneous detections, such as spikes. We restricted tapping sequences to a maximum of 15 taps, as the MDS-UPDRS scale does not require longer sequences and Figure 1: Amplitude graphs of finger tapping. participants may tire during extended sessions. We also applied min-max normalization to account for varying distances between the camera and the hand. The finalised graphs of the processed 3.4 Neural network data are depicted in Figure 1. We opted for FCN due to its simplicity and efficiency, leaving the exploration of more complex models for future work. We tested the 3.3 Feature Engineering FCN presented by Li et al. [14], using preprocessed time series data In the 1st method we created a larger number of features following directly as input. Since the selected FCN is limited to processing the MDS-UPDRS scale to comprehensively describe finger tapping. equally long inputs, we padded our time series data with 0 at the In addition to Euclidean distances between the endpoints of the end. We later modified the FCN by adding convolutional layers, thumb and index finger, we included distances between the last dropout layers, early stopping, and adjusted input layers to handle joints and absolute wrist movement as additional time series data. 2D inputs consisting of amplitudes and velocities. The architecture We hypothesized that these metrics could provide supplementary of our extended FCN is shown in Figure 2. 66 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Automatic Assessment of Bradykinesia in Parkinson’s Disease Using Tapping Videos hand and camera, noise, and motion blur. That required a robust approach, with filtering being an important part. To balance noise reduction and signal preservation, we opted for milder filtering. Table 2: Results were obtained via 10-fold cross-validation. FCN refers to Li et al.’s neural network [14], while FCN+ denotes our modified version (Figure 2). MLP was used for Figure 2: FCN for classification of bradykinesia. Method 1, while RF was used for Method 2. 4 EVALUATION Model Accuracy % F1 % Precision % Recall % We created a visualization tool for analyzing finger-tapping dy-FCN 72 62 84 63 namics, featuring a built-in video player with a MediaPipe Hand FCN+ 77 75 88 75 skeleton overlay. Velocity and amplitude graphs on the right side Method 1 61 62 67 58 indicate the current frame with a vertical red line (Figure 3). Taps Method 2 60 55 67 58 are denoted by red dots and the tool includes a freeze labelling option. This interactive analysis of tapping dynamics could be useful Due to the low capture speed and fast movement of fingers, mo-for neurologists in diagnosing and monitoring motor disorders. tion blur was present in the videos. To address this, we tested two All tests were conducted using 10-fold cross-validation due to the NNs for motion blur removal: Ghost-DeblurGAN [15] and PDV_net relatively small dataset. The F1 score was calculated as the Macro [27]. However, both methods introduced artefacts in the frames, F1 Í = 1 𝐶 𝐶 𝑖=1 F1𝑖 , where 𝐶 is the number of classes and F1𝑖 is the prompting us to discontinue their use. We also experimented with F1 score for class 𝑖. In Methods 1 and 2, we used SelectKBest [26] upscaling the resolution to 200 % of the original size using Video2x for feature selection. As score functions we tried ANOVA F-test [12]. This aimed to enhance image clarity, potentially improving [24] and mutual information for a discrete target [25], with the MediaPipe Hand’s skeleton detection precision and reducing noise. latter performing better overall. By experimenting with various fea-Testing on a smaller upscaled subset showed minimal differences ture counts, we identified the optimal number that maximized the in classification performance but significantly increased processing model’s F1 score. Results are detailed in Table 2. Overall, SVM per-time, prompting us to abandon this approach due to time constraints. Similarities were observed across different classes of tapping, as shown in Figure 1, where the graphs appear similar. Various factors contribute to the overlapping of classes. Small differences in tapping styles between adjacent classes mean even minor decreases in velocity or amplitude can significantly impact the final classification. Normalization contributes to reduced differentiation between classes by masking tapping instances with low amplitudes. However, it is necessary to account for variations in recording distances and resolutions. Additionally, recording angles can distort actual finger distances, leading to misleading data representation. Another factor for class overlap is the possibility of human errors in video assessments. However, within the same class, tapping behaviours Figure 3: Tool for detailed visualization of finger tapping. can vary significantly. For example, in class 4, participants often showed varying abilities: some managed to perform a few taps despite severe difficulties, while others were unable to tap at all. formed the worst in Methods 1 and 2, while RF and MLP achieved Direct comparisons with related research may be challenging the best results, likely due to their superior ability to model com-due to the use of different datasets. When comparing our Methods 1 plex patterns. Method 1 outperformed Method 2 across all metrics. and 2 with the method by Yu et al. [30], who derived features based Approximately 83 % and 73 % of misclassifications from Methods 1 on MDS-UPDRS, we achieved lower scores. They reported 80 % and 2, respectively, differed from the reference tapping scores by accuracy and 79 % recall on a test set of only 15 videos recorded as exactly 1 class. The FCN from Figure 2 significantly outperformed close to a 90-degree angle as possible. Frame interpolation they used Methods 1 and 2, achieving a 77 % accuracy. FCN excels at capturing might distort tapping details with artificial data, risking the reliabil-both local and global dependencies in signals by using filters of ity of their classification outcomes. Islam et al. [9] investigated SVR, varying sizes. LightGBM, and XGBoost regressor, achieving up to 55 % accuracy. This is lower than the 61 % accuracy we achieved with Method 1, 5 DISCUSSION possibly due to their larger database of 489 videos, less effective Our data set was diverse, assembled by tapping videos of differ-preprocessing and a feature set of 65 features that may not fully ent people among all MDS-UPDRS classes. Since the dataset was capture tapping dynamics. The FCN presented by Li and colleagues collected without any professional equipment we were dealing [14] achieved 72 % accuracy on our dataset, compared to the 80 % re-with different illuminations, angles, camera tilts, distances between ported by the authors. We attribute the slightly lower classification 67 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Matjaž Zupanič, Dejan Georgiev, and Jure Žabkar performance on our data to its complexity and heterogeneity, over [10] Jacek Jakubowski, Anna P Chromik, Jolanta Chmielinska, Monika Nojszewska, 37 % smaller size, and the use of 10-fold cross-validation compared and Anna K Pruszczyk. 2023. Application of imaging techniques to objectify to their 5-fold approach. However, by enhancing the FCN (Figure 2) the Finger Tapping test used in the diagnosis of Parkinson’s disease. Bulletin of the Polish Academy of Sciences. Technical Sciences 71 (2023), art. no. e144886. we improved prediction accuracy to 77 %. Alam et al. [2] reported https://doi.org/10.24425/bpasts.2023.144886 81 % accuracy and F1 score of 0.81 on their test set using a graph [11] Hyoseon Jeon, Woongwoo Lee, Hyeyoung Park, Hong J Lee, Sang K Kim, Han B Kim, Beomseok Jeon, and Kwang S Park. 2017. Automatic Classification of neural networks (GNN). Tremor Severity in Parkinson’s Disease Using a Wearable Device. Sensors (Basel) 17, 9 (Sept. 2017). 6 CONCLUSION [12] k4yt3x. 2024. Video2x. https://github.com/k4yt3x/video2x Accessed: 16-02-2024. [13] Claudia Lainscsek, Peter Rowat, Luis Schettino, Dongpyo Lee, D Song, Cristophe In conclusion, Method 1 with MLP provided the best performance Letellier, and Howard Poizner. 2012. Finger tapping movements of Parkinson’s disease patients automatically rated using nonlinear delay differential equations. between the manual methods, with better overall metrics and 10 % Chaos: An Interdisciplinary Journal of Nonlinear Science 22, 1 (2012). https: fewer misclassifications differing by one class from the reference //doi.org/10.1063/1.3683444 value. The modified FCN+ (Figure 2 ) further improved accuracy to [14] Zhu Li, Lu Kang, Miao Cai, Xiaoli Liu, Yanwen Wang, and Jiayu Yang. 2022. An Automatic Evaluation Method for Parkinson’s Dyskinesia Using Finger Tapping 77 %. Results are expected, since with manual time-invariant feature Video for Small Samples. Journal of Medical and Biological Engineering 42, 3 (1 extraction it is challenging to capture all unique patterns at various 2022), 351–363. https://doi.org/10.1007/s40846-022-00701-y [15] Yibo Liu, Amaldev Haridevan, Hunter Schofield, and Jinjun Shan. 2022. Ap-scales in a tapping sequence of around 400 frames. In the future, plication of Ghost-DeblurGAN to Fiducial Marker Detection. In 2022 IEEE/RSJ we plan to expand the dataset, as our class with an MDS-UPDRS International Conference on Intelligent Robots and Systems (IROS). 6827–6832. score of 4 had a limited population. Increasing the dataset will https://doi.org/10.1109/IROS47612.2022.9981701 [16] Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, help achieve more precise results and provide more data to create Michael Hays, Fan Zhang, Chuo L Chang, Ming G Yong, Juhyun Lee, Wan T an unseen test set. Due to only one labeler, we plan to involve Chang, Wei Hua, Manfred Georg, and Matthias Grundmann. 2019. MediaPipe: another neurologist to cross-validate our dataset and eliminate A Framework for Building Perception Pipelines. ArXiv abs/1906.08172 (2019). https://doi.org/10.48550/arXiv.1906.08172Focustolearnmore potential human errors. We also plan to explore regression models [17] ALexander Meigal, Saara Rissanen, Mika P Tarvainen, Stefanos Georgiadis, Pasi A on extracted features to predict continuous severity scores, offering Karjalainen, Olavi Airaksinen, and Markku Kankaanpää. 2012. Linear and nonlinear tremor acceleration characteristics in patients with Parkinson’s disease. a more detailed evaluation of bradykinesia.Given neural networks’ Physiological measurement 33, 3 (2012), 395. superior performance, we aim to explore graph neural networks [18] Adonay S Nunes, Nataliia Kozhemiako, Christopher D Stephen, Jeremy D (GNN) for handling all data points extracted by MediaPipe. Schmahmann, Sheraz Khan, and Anoopum S Gupta. 2022. Automatic Classification and Severity Estimation of Ataxia From Finger Tapping Videos. Frontiers in Neurology 12 (2022). REFERENCES [19] Atemangoh B Peachap, Daniel Tchiotsop, Valérie Louis-Dorr, and Didier Wolf. 2020. Detection of early Parkinson’s disease with wavelet features using finger [1] Warwick R Adams. 2017. High-accuracy detection of early Parkinson’s Disease typing movements on a keyboard. SN Applied Sciences 2, 10 (2020). using multiple characteristics of finger movement while typing. PLOS ONE 12, [20] Cameron N Riviere, Stephen G Reich, and Nitish V Thakor. 1997. Adaptive 11 (11 2017), 1–20. https://doi.org/10.1371/journal.pone.0188226 Fourier modeling for quantification of tremor. Journal of Neuroscience Methods [2] Zarif U Alam, Saiful Islam, Ehsan Hoque, and Saifur Rahman. 2023. PULSAR: 74, 1 (1997), 77–87. https://doi.org/10.1016/S0165-0270(97)02263-2 Graph based Positive Unlabeled Learning with Multi Stream Adaptive Convo- [21] Aleksander Sadikov, Jure Žabkar, Martin Možina, Vida Groznik, Dag Nyholm, lutions for Parkinson’s Disease Recognition. https://doi.org/10.48550/ARXIV. and Mevludin Memedi. 2015. Feasibility of Spirography Features for Objective 2312.05780 Assessment of Motor Symptoms in Parkinson’s Disease. In Artificial Intelligence [3] Nin P S Bajaj, Vamsi Gontu, James Birchall, James Patterson, Donald G Grosset, in Medicine, Lucia Sacchi John H Holmes, Riccardo Bellazzi and Niels Peek (Eds.). and Andrew J Lees. 2010. Accuracy of clinical diagnosis in tremulous parkin-Springer International Publishing, Cham, 267–276. sonian patients: a blinded video study. Journal of Neurology, Neurosurgery & [22] Arash Salarian, Heike Russmann, Christian Wider, Pierre R Burkhard, Françios Psychiatry 81, 11 (2010), 1223–1228. https://doi.org/10.1136/jnnp.2009.193391 J G Vingerhoets, and Kamiar Aminian. 2007. Quantification of Tremor and [4] Parkinson’s Europe. 2024. Parkinson’s Statistics. https://parkinsonseurope.org/ Bradykinesia in Parkinson’s Disease Using a Novel Ambulatory Monitoring facts-and-figures/statistics/ Accessed: 7-3-2024. System. IEEE Transactions on Biomedical Engineering 54, 2 (2007), 313–322. [5] Joseph P Giuffrida, David E Riley, Brian N Maddux, , and Dustin A Heldman. https://doi.org/10.1109/TBME.2006.886670 2009. Clinically deployable Kinesi technology for automated tremor assessment. [23] Claudia Schulte and Thomas Gasser. 2011. Genetic basis of Parkinson’s disease: Movement Disorders 24, 5 (2009), 723–730. https://doi.org/10.1002/mds.22445 inheritance, penetrance, and expression. The Application of Clinical Genetics 4 [6] Christopher G Goetz, Barbara C Tilley, Stephanie R Shaftman, Glenn T Steb- (2011), 67–80. https://doi.org/10.2147/TACG.S11639 bins, Stanley Fahn, Pablo Martinez-Martin, Werner Poewe, Cristina Sampaio, [24] scikit learn. 2024. fclassif. https://parkinsonseurope.org/facts-and-figures/ Matthew B Stern, Richard Dodel, Bruno Dubois, Robert Holloway, Joseph statistics/ Accessed: 7-3-2024. Jankovic, Jaime Kulisevsky, Anthony E Lang, Andrew Lees, Sue Leurgans, Pe- [25] scikit learn. 2024. mutual info classif. https://scikit-learn.org/stable/modules/ ter A LeWitt, David Nyenhuis, Warren C Olanow, Olivier Rascol, Anette Schrag, generated/sklearn.feature_selection.mutual_info_classif.html#sklearn.feature_ Jeanne A Teresi, Jacobus J van Hilten, and Nancy LaPelle. 2008. Movement selection.mutual_info_classif Accessed: 7-3-2024. Disorder Society-sponsored revision of the Unified Parkinson’s Disease Rating [26] scikit learn. 2024. Select K Best. https://scikit-learn.org/stable/modules/ Scale (MDS-UPDRS): scale presentation and clinimetric testing results. Move-generated/sklearn.feature_selection.SelectKBest.html Accessed: 28-02-2024. ment disorders: official journal of the Movement Disorder Society 23, 15 (2008), [27] Hyeongseok Son, Junyong Lee, Jonghyeop Lee, Sunghyun Cho, and Seungyong 2129–2170. Lee. 2021. Recurrent Video Deblurring with Blur-Invariant Motion Estimation [7] Dimitrios Iakovakis, Stelios Hadjidimitrio, Vasileios Charisis, Sevasti Bostan-and Pixel Volumes. ACM Trans. Graph. 40, 5, Article 185 (2021), 18 pages. tjopoulou, Zoe Katsarou, Lisa Klingelhoefer, Heinz Reichmann, Sofia B Dias, [28] Molly M Sturman, David E Vaillancourt, , and Daniel M Corcos. 2005. Effects of José A Diniz, Dhaval Trivedi, Ray K Chaudhuri, and Leontios J Hadjileontiadis. aging on the regularity of physiological tremor. Journal of neurophysiology 93, 6 2018. Motor impairment estimates via touchscreen typing dynamics toward (2005), 3064–3074. Parkinson’s disease detection from data harvested in-the-wild. Frontiers ICT 5 [29] Stefan Williams, Samuel D Relton, Hui Fang, Jane Alty, Rami Qahwaji, Christo- (2018). pher D Graham, and David C Wong. 2020. Supervised classification of bradyki- [8] Dimitrios Iakovakis, Stelios Hadjidimitriou, Vasileios Charisis, Sevasti Bostant-nesia in Parkinson’s disease from smartphone videos. Artificial Intelligence in zopoulou, Zoe Katsarou, and Leontios J Hadjileontiadis. 2018. Touchscreen Medicine 110 (2020), 101966. https://doi.org/10.1016/j.artmed.2020.101966 typing-pattern analysis for detecting fine motor skills decline in early-stage [30] Tianze Yu, Kye W Park, Martin J McKeown, and Jane Z Wang. 2023. Clinically Parkinson’s disease. Scientific Reports 8, 1 (2018). Informed Automated Assessment of Finger Tapping Videos in Parkinsonrsquo;s [9] Saiful Islam, Wasifur Rahman, Abdelrahman Abdelkader, Sangwu Lee, Phillip T Disease. Sensors 23, 22 (2023). https://doi.org/10.3390/s23229149 Yang, Jennifer L Purks, Jamie L Adams, Ruth B Schneider, Earl R Dorsey, and Ehsan Hoque. 2023. Using AI to measure Parkinson’s disease severity at home. npj Digital Medicine 6, 156 (2023). https://doi.org/10.1038/s41746-023-00905-9 68 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Exploring Mathematical Decision-Making Through EEG Analysis Riste Micev Peter Rogelj riste.micev1@gmail.com peter.rogelj@upr.si University of Primorska, University of Primorska, Faculty of Mathematics, Natural Sciences Faculty of Mathematics, Natural Sciences and Information Technologies, and Information Technologies, Koper, Slovenia Koper, Slovenia ABSTRACT and logical reasoning, making them an ideal for studying brain In this study, mathematical decision-making tasks were used to pro-connectivity. vide further details on the flow of information across a number of brain regions, with the objective of finding out whether connectiv-1.2 Objectives ity patterns are informative in predicting decisional outcomes. The Our primary objective with this research is to assess existing tech-experiment consisted of showing 50 mathematical expressions to niques for connectivity analysis and to develop a comprehensive each participant, and they decided on their correctness by pressing pipeline for the analysis of brain processes (during mathematical buttons. Neural activity and button presses were recorded by means decision-making tasks), with the focus on creation and refinement of the g.tec Nautilus EEG device, equipped with 64 electrodes. A of methodologies that would be able to classify and explain these detailed epochs analysis was conducted with regard to participant processes. Through the comprehensive analysis we also aim to responses. Advanced techniques of signal analysis were applied, validate two specific hypotheses. including Granger causality, Phase Locking Value, and Complex Pearson Correlation Coefficient. This research aims to determine • H1 - Mathematical thinking causes unique connectivity patterns, differentiable from resting state brain activity. how the following tools could distinguish events from states, get aware of their limitations, and develop novel analysis techniques • H2 - True and false answers can be distinguished by their EEG signals. for better discrimination of brain processes. This research is specifically focused on using mathematical reasoning as a model to study The motivation behind this study is to contribute to the un-decision-making processes. Our objective is to test existing and derstanding of cognitive processes by providing insights into the develop novel methods for gaining deeper understanding of the neural dynamics of decision-making. brain dynamics involved in discrete cognitive activities. 2 LITERATURE REVIEW KEYWORDS EEG has established itself as an invaluable tool in cognitive neu-EEG, mathematical decision-making, neural connectivity, connec-roscience, particularly for exploring brain activity in real time. Its tivity analysis, neural signal classification, EEGNet application in understanding the neural mechanisms of mathematical decision-making has drawn considerable interest due to the dynamic and complex nature of the task. Previous studies have 1 INTRODUCTION demonstrated that specific brain regions, particularly within the frontal and parietal lobes, are significantly active during mathemat-1.1 Background ical cognition, reflecting the intricate process of problem-solving Electroencephalography (EEG) devices are core tools in neuro-and numerical reasoning [1, 2]. science for the monitoring of brain activity through the detection Basic methods for EEG analysis rely on statistical analysis of of electrical potentials in different places on the scalp [4]. They find independent electrode signals, and do not enable reliable differentia-applications in a wide variety of both clinical and research settings tion of complex brain activities. This can be achieved by additionally during the investigation of brain activity and connectivity. The abil-considering the mutual signal interdependence as a reflection of ity to understand the brain processes is crucial for advancements in utilization of brain networks, known as brain connectivity analysis. neuroscience, medical diagnostics and brain-computer interfaces. There are several accepted brain connectivity analysis methods, Identification and improvement of methods that are capable of clas-which include Phase Locking Value (PLV), Weighted Phase Locking sifying events and explaining the underlying decision process is Index (wPLI), Complex Pearson Correlation Coefficient (CPCC) [7] also of a great importance. and Granger Causality (GC). Most methods including PLV, wPLI In this study we aim to set the whole pipeline for conducting and CPCC are not directional and rely on analyzing phase differ-such research, which consists of data acquisition and analysis. For ences between the electrode signals. GC is a directional method, our brain process of interest we selected mathematical reasoning, developed by Clive Granger in the 1960s, and determines whether which exemplifies decision-making processes. It is selected because one time series can help predicting another one. Applied on EEG of its complexity that enables layered analysis of sub processes, as data it can reveal directional influences between different neural mathematical thinking involves complex cognitive processes that regions covered by corresponding EEG electrodes. engage multiple brain regions. Mathematical decision-making tasks Granger causality has been widely used in neuroscience to ex-require the integration of numerical processing, working memory, plore the temporal dynamics of brain activity. For example, it has DOI: https://doi.org/10.18690/um.feri.6.2024.16 ISBN 978-961-286-914-4 69 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Riste Micev and Peter Rogelj been applied to EEG signals to investigate the functional connec-An optical sensor was used to exactly capture the display time tivity between different brain areas during various cognitive states. of each equation, thus ensuring correct synchronization with the Recent research by Seth, Barrett, and Barnett (2015) [6] has further visual presentation and EEG data. This setup allowed for exact demonstrated the effectiveness of GC in identifying directed func-timing of participant responses relative to the presentation of the tional interactions in neuroscience and neuroimaging time-series equations. data. Their findings indicate that GC can reveal insights into the In this experiment we recorded EEG data using the g.recorder functional circuits involved in perception, cognition, and behavior. software, that captured the continuous EEG signals on all activi-This research also emphasizes both the theoretical foundations and ties of 64 channels at a sampling rate of 250 Hz. The software also practical applications of GC while discussing its limitations and recorded the presentation timings of the stimuli and participant re-computational strategies, thus solidifying its role as a crucial tool sponses according to the detection of the optical sensor and trigger in neuroscience. box. This setup ensured that all relevant events will be accurately With the advances in artificial intelligence the use of artificial time-stamped and synchronised with the EEG data. neural networks also affects EEG analysis. One of the most promis-Each of the participants completed the experiment in an individ-ing artificial neural network architectures for classification of EEG ual session, which in total lasted approximately 15 minutes. The data is EEGNet [3], a compact convolutional neural network de-data was stored safely for later preprocessing and analysis. signed for EEG-based brain-computer interfaces. Recently, it has been shown that neural networks can contribute to understanding 3.4 Data Collection of underlying processes, by computing saliency maps [5]. As such, artificial neural networks could also be extended and utilized to Figure 1 shows the raw EEG data recorded from one of the subjects reveal connectivity patterns. while performing the mathematical decision-making task. EEG signals, captured from 64 channels, are shown here along with their 3 METHODOLOGY respective labels on the y-axis, which represent electrode positions at different locations on the scalp, and the x-axis represents time in 3.1 Participants seconds. For the purpose of the study, we recruited 15 participants from In this visualization, specific triggers are marked to indicate im-the university. Each participant provided written informed consent portant events during the experiment. The green line represents before participating in this experiment. This work was approved by a trigger at the moment the equation on screen changes, thus sig-the university’s ethical committee to ensure the study conformed naling the beginning of a new arithmetic problem. This trigger is to ethical standards for studies involving human participants. important in synchronizing EEG data with the exact moment each equation is presented to the participant, thus providing the ability 3.2 Equipment to analyze the neural response to the stimulus very precisely. • EEG Headset: g.tec Nautilus EEG device with 64 channel The red line marks the trigger corresponding to the event of a electrodes. user pressing the “incorrect” button, thus signaling his/her decision • Base Station: Connected to the EEG headset for data trans-that the equation presented is wrong. mission. Although not shown in the current image, a blue line is used as • Trigger Box: Connected to the base station, equipped with a trigger to mark the event when a participant presses the “correct” two response buttons. button, indicating their decision that the equation is correct. • Optical Sensor: Connected to the trigger box to detect changes in the visual stimuli. 3.5 Data Preprocessing • Recording Software: g.recorder for capturing and storing One of the most important steps in satisfying the quality and re-EEG data. liability of the recorded signal before detailed analysis is the preprocessing of EEG data. MATLAB with the EEGLAB toolbox was 3.3 Procedure used for this purpose, where advanced functionalities were applied The experiment was set in the following way: to deal and handle with the intricate nature of EEG data. The preA participant was seated comfortably in a noise-free, dimly lit processing pipeline started by filtering all frequencies of the raw room to help eliminate other external factors that might cause EEG data outside the frequency range of interest. That was easily discomfort. An EEG headset was fitted on the head of the partic-accomplished with the help of a bandpass filter with the limiting ipant, making sure that the contact of all the electrodes on the frequencies of 0.1 Hz and 45 Hz. This filtering step was quite impor-scalp is good to ensure high-quality signals. The headset was then tant to avoid noise or other effects due to muscle activity, electrical connected to the base station and trigger box. interference, etc. The experiment consisted of 50 mathematical equations that After filtering, the EEG data was re-referenced to the common were shown for 10 seconds each on a computer screen, where the average reference. This involved averaging the signal from all elec-research participants had to determine whether the equation was trodes and subtracting this average from each individual electrode’s correct or incorrect. Responses were marked by pressing one of signal to give clarity to the signal and remove common noise. Com-the two corresponding buttons connected to the trigger box. At the mon average referencing is conducted as a standard operation in end of each equation, there was a resting phase of 3 seconds where preprocessing when carrying out EEG. This operation helped to the subjects could rest before the next equation appears. normalize data across different channels. 70 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Exploring Mathematical Decision-Making Through EEG Analysis Figure 1: Visualisation of the raw signals data. Artifacts were removed by Independent Component Analysis were extracted in a similar way with 3 second intervals before the (ICA), where the EEG signal was decomposed into independent com-blue and red triggers in the dataset. ponents. With the help of ICA and the ICLabel add-on in MATLAB, components related to common artifacts due to eye movements, blinks, and muscle activity were identified and isolated. Removing 3.7 Connectivity Matrices these artifact components from the data ensured that the remaining Connectivity matrices serve as a fundamental tool in neuroscience signals are more representative of the true neural activity. for visualizing and quantifying the intricate patterns of neural interactions within the brain. These matrices can be derived with 3.6 Hypothesis Testing the use of the different connectivity analysis techniques mentioned above. The basic idea of our hypothesis testing approach revolves around In Figure 2 we can see the Granger causality matrices obtained developing and training classification methods on epochs that from one of the subjects’ EEG recordings in resting and active we define specifically around key events (equation change, incor-cognitive states respectively. Each matrix describes the directional rect/correct marking). For example, for testing the first hypothesis influences between pairs of EEG electrodes over the scalp. The H1, two primary states can be defined: x-axis labels denote the influencing electrodes, and the y-axis labels • Rest - Epochs taken from a 3-second window just before indicate the influenced electrodes. Each cell in this matrix thus a new equation appears - shown by this green line in our corresponds to a pair of electrodes; the color of each cell reflects recording setup. This is the period not active for making the strength of causal influence from the electrode on the x-axis to any judgment, which in turn gives our baseline or rest state. the electrode on the y-axis. • Active - In contrast active state epochs taken from a 3-The color scale, ranging from 0 to 0.18, is provided at the right second window just prior to the participants’ responses side of the matrices. The colors change from cool colors like blue, since these active states are thought to carry neural signa-indicating very weak causal influence, to warm colors like yellow, tures related to the cognitive processes of judgment and representing very strong influences. This scale will help the eye in decision making on the mathematical expressions. assessing the strength and distribution of connectivity across the brain. In this way we can directly compare neural activity in both "decision-making" and "resting" conditions, with the specific objective of the identification of distinct patterns that could validate our 4 RESULTS hypotheses about the differential brain connectivity in different After segmenting the epochs and preparing the EEG data from all cognitive states. participants, we used the EEGNet neural network to classify resting For the second hypothesis, H2, testing adapts methodologies and active states, in order to validate H1. The network was trained developed for H1 but focus on epochs particularly related to the with 80% of the data and tested with the remaining 20%. correctness of the participant’s response. It is hypothesized that This resulted in a classification accuracy of about 84%, show-EEG signals could differentiate between true and false answers of ing that distinct neural connectivity patterns are present during participants in mathematical decision-making tasks. The epochs mathematical decision-making tasks compared to resting states. 71 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Riste Micev and Peter Rogelj Figure 2: Visualisation of the connectivity matrices of rest and active states. The findings support our hypothesis that, mathematical thinking In summary, this research adds a great deal into the field of devel-causes unique connectivity patterns, differentiable from resting opment of methodologies that further improve our understanding state brain activity. This suggests the promising capability of the of cognitive processes and pushes the boundaries of how we can in-EEGNet to discriminate between rest and active states based on the teract with technology using Brain Computer Interfaces (BCI) and neural data collected around the event-defined epochs. analyse neurological conditions. Further evolution of these meth-We also did some testing on the second hypothesis H2. Initial ods is likely to close the gap between human cognitive functions tests using the EEGNet neural network for epochs related to cor-and machine interpretation, setting the stage for possible future rect and incorrect responses resulted in classification accuracies advances that may change neurological healthcare and technology of about 50%, which is clearly insufficient. These results suggest interfacing. two possible explanations: either the EEG signals do not contain enough distinguishing information, or the applied methods, are not REFERENCES yet optimized to detect subtle differences in brain activity. [1] Mike X Cohen. 2014. Analyzing Neural Time Series Data: Theory and Practice. The Given these results, we will continue to refine our analytical MIT Press. [2] S. Dehaene, N. Molko, L. Cohen, and A. J. Wilson. 2004. Arithmetic and the brain. methods and to explore alternative models for a better representa-Current Opinion in Neurobiology 14, 2 (2004), 218–224. tion of neural dynamics. Hypothesis H1 has proven to be a more [3] V. J. Lawhern et al. 2018. EEGNet: a compact convolutional network for EEG-based accessible goal, while hypothesis H2 presents a greater challenge. brain-computer interfaces. Journal of Neural Engineering 15, 5 (2018), 056013. [4] G. A. Light, L. E. Williams, F. Minow, J. Sprock, A. Rissling, R. Sharp, N. R. Swerdlow, Should we confirm H2, it could revolutionize how we estimate and D. L. Braff. 2010. Electroencephalography (EEG) and Event-Related Potentials knowledge and decision-making processes based on neural data. (ERPs) with Human Participants. Current Protocols in Neuroscience 52 (2010), 6.25.1–6.25.24. [5] S. Mortier et al. 2023. Classification of Targets and Distractors in an Audiovisual 5 CONCLUSION Attention Task Based on Electroencephalography. Sensors 23, 23 (2023), 9588. [6] A. K. Seth, A. B. Barrett, and L. Barnett. 2015. Granger Causality Analysis in The main objective of this research is the development and enhance-Neuroscience and Neuroimaging. Journal of Neuroscience 35, 8 (2015), 3293–3297. ment of methodologies to analyse EEG signals during cognitive [7] Z. Šverko, M. Vrankić, S. Vlahinić, and P. Rogelj. 2022. Complex Pearson Cor-tasks, with a special emphasis on mathematical decision-making. relation Coefficient for EEG Connectivity Analysis. Sensors (Basel) 22, 4 (2022), 1477. The strategy taken in this research provides a model for future works on more complex cognitive phenomena. It indicates the need for precise acquisition of data, sophisticated preprocessing strategies, and new analytical techniques in an attempt to capture and interpret correctly the activity in the brain. 72 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Analysis of Verbal Fluency in Slovenian Language in Patients With Schizophrenia Mila Marinković Polona Rus Prelog mm9136@student.uni-lj.si polona.rus@psih-klinika.si University of Ljubljana, University Psychiatric Hospital Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia Ljubljana, Slovenia Martina Zakšek Jure Žabkar martina.zaksek@gmail.com jure.zabkar@fri.uni-lj.si Splošna bolnišnica Celje, University of Ljubljana, Celje, Slovenia Faculty of Computer and Information Science, Ljubljana, Slovenia ABSTRACT controls. To address this gap, we employed a comprehensive analyt-This study investigates verbal fluency in the Slovenian language ical approach combining traditional statistical tests with advanced among individuals diagnosed with schizophrenia compared to semantic similarity measures using FastText embeddings. healthy controls. Participants completed a verbal fluency task, We hypothesized that while local semantic relationships might which involved producing as many words as possible starting with not differ significantly between groups, broader semantic coherence a specific letter in Slovenian within a set time limit. The analy-and structural organization of speech would be markedly impaired sis included statistical testing and semantic similarity measures in schizophrenia. By leveraging advanced techniques such as Fast-using FastText embeddings. Significant differences were found Text embeddings, we aimed to uncover deeper insights into the between the groups in terms of the number of correct and total semantic characteristics of verbal fluency in schizophrenia. words produced. While semantic similarity showed minimal dif-In this paper, we present a detailed analysis of verbal fluency per-ferences, global optimality divergence revealed notable disparities. formance in individuals with schizophrenia compared to healthy These findings highlight the utility of comprehensive analytical controls. We discuss the implications of our findings for under-approaches in understanding verbal fluency deficits in schizophre-standing the cognitive and linguistic disruptions associated with nia, emphasizing the need for nuanced methods to capture the schizophrenia and propose directions for future research to further complexity of cognitive impairments in this population. explore these impairments. KEYWORDS verbal fluency, schizophrenia, Slovenian language, semantic analysis, statistical analysis 2 RELATED WORK The analysis of verbal fluency in individuals with schizophrenia has 1 INTRODUCTION been extensively researched to understand the cognitive and neural Verbal fluency tests are widely used to assess cognitive function and mechanisms underlying the disorder. This study draws upon several linguistic abilities in various clinical populations, including individ-key pieces of related work that have influenced our methodology uals with schizophrenia. These tests, which require participants to and analytical approaches. say words based on specific criteria, provide valuable insights into Nour et al. [8] investigated the semantic trajectories in schizophre-semantic memory, executive function, and language processing nia by analyzing verbal fluency tasks using a computational model capabilities. of word embeddings. Their study highlighted the reduced seman-Schizophrenia is a chronic mental disorder characterized by tically guided word selection in people with schizophrenia and symptoms such as cognitive disorganization, impaired semantic its correlation with hippocampal disruptions. This approach un-processing, and executive dysfunction. These symptoms often mani-derscored the importance of semantic distance in understanding fest as deficits in verbal fluency, where affected individuals typically cognitive disorganization in schizophrenia and inspired our use produce fewer words and commit more errors, such as repetitions, of FastText to compute word embeddings and analyze semantic intrusions, and neologisms. Understanding these verbal fluency distances between words generated during verbal fluency tasks. deficits is crucial for developing targeted cognitive and linguistic Galaverna et al.[1] conducted a detailed analysis of errors in interventions. verbal fluency tasks among individuals with chronic schizophre-Previous studies [1–8] have documented that individuals with nia. Their research emphasized the prevalence of perseverative schizophrenia exhibit notable impairments in verbal fluency tasks, and intrusion errors in verbal fluency tasks, highlighting signifi-producing fewer words and making more errors compared to healthy cant moderators such as the severity of negative symptoms, formal DOI: https://doi.org/10.18690/um.feri.6.2024.17 ISBN 978-961-286-914-4 73 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Mila Marinković, Polona Rus Prelog, Martina Zakšek, and Jure Žabkar thought disorder, and pharmacological variables. This study pro-3 METHODOLOGY vided crucial insights into the patterns of errors (intrusions, repeti-This section outlines the methodology used in our study to in-tions, neologisms) in verbal fluency tasks, which are essential for vestigate verbal fluency deficits in individuals with schizophrenia understanding the cognitive deficits associated with schizophrenia. compared to healthy controls. We describe the participants, pro-Ojeda et al. [5] explored the relationship between verbal fluency cedures, data collection, and data analysis methods employed to and other cognitive domains in patients with schizophrenia and gather and analyze the data. healthy controls. Their findings indicated that while healthy controls’ verbal fluency was primarily predicted by processing speed, in 3.1 Participants patients with schizophrenia, it was more closely related to working memory. This study highlights the differing cognitive mechanisms The study involved a total of 126 participants, divided into two underlying verbal fluency performance in schizophrenia and in-groups: 58 individuals diagnosed with schizophrenia and 68 healthy formed our consideration of different cognitive variables in our controls. The participants were matched for age and gender to en-analysis. sure comparability between the groups. All participants were 18 Grimes et al. [2] examined the stability of verbal fluency abilities years or older. Exclusion criteria included an inability to speak in outpatients with schizophrenia over a one-year period. They Slovenian, a history of intellectual disability, organic brain condi-found that verbal fluency abilities remained stable over time, pro-tions, or substance abuse. For healthy controls, additional criteria viding evidence against significant longitudinal decline in these included no history of psychiatric disorders or substance abuse. cognitive domains among individuals with chronic schizophrenia. The study was approved by the Medical Ethics Committee of the This study’s findings on the stability of verbal fluency informed Republic of Slovenia, and all participants provided written informed our understanding of the temporal consistency of cognitive impair-consent. ments in schizophrenia. 3.2 Procedure Lehtinen et al.[4] presented a systematic administration and analysis approach for verbal fluency tasks, highlighting the importance Participants were asked to perform a verbal fluency task in which of detailed scoring guidelines and exploring various underlying they had to say as many words as possible that start with the letter cognitive processes. Their method provided strong inter-rater relia- "L" in Slovenian within one minute. This task was administered bility and demonstrated significant effects of education and gender individually in a quiet room to minimize distractions. All responses on verbal fluency performance, reinforcing the need for compre-were recorded for subsequent analysis. Demographic data, includ-hensive analysis beyond total scores. This study’s emphasis on ing age, gender, education level, marital status, employment status, clustering, switching, and error analysis informed our analytical ap-and hospitalization history, were collected prior to the test to pro-proach to understanding the cognitive processes involved in verbal vide a comprehensive overview of the sample population. fluency tasks in schizophrenia. Kosmidis et al. [3] studied verbal fluency in institutionalized 3.3 Data Collection patients with schizophrenia, focusing on age-related performance All data were recorded and stored in a secure database. The verbal decline. They found that elderly patients exhibited a dispropor-fluency responses were transcribed and annotated for analysis. tionate decline in phonemic fluency compared to younger patients, Each word was evaluated for its accuracy and categorized as correct, while semantic fluency remained relatively stable. This research intrusion (an incorrect word not fitting the criteria), repetition (same highlighted the impact of aging on cognitive strategies like cluster-word used more than once), or neologism (made-up word). The ing and switching, which are critical for verbal fluency tasks. The timestamps for each word were also recorded to facilitate temporal findings underscore the importance of considering age and institu-analysis. Additionally, FastText embeddings were computed for tionalization duration when analyzing verbal fluency in schizophre-each word to enable semantic similarity analysis. niaNogueira et al. [7] provided normative data on semantic and Table 1: Demographic Characteristics of the Participants. phonemic verbal fluency tasks for a European Portuguese popula-Measure Schizophrenia Patients Healthy Controls tion, considering the effects of age, gender, and education. Their Total Participants 58 68 study demonstrated that age and education significantly affect ver-Average Age (years) 46.05 46.71 bal fluency performance, while gender has a more variable impact. Prevalent Education Level Primary school High school This research supports the need for demographic adjustments in Avg. Elementary School grade-point 3.57 4.72 verbal fluency assessments and informed our methodology in ad-Avg. Secondary School justing for these variables in our analysis. grade-point 3.46 4.39 These studies collectively emphasize the multifaceted nature of Male Distribution 29 35 verbal fluency impairments in schizophrenia, necessitating the use Female Distribution 29 33 of advanced analytical techniques to capture the underlying cognitive and linguistic disruptions. Our approach, combining traditional Demographic data, summarized in Table 1, were analyzed to en-statistical tests with semantic similarity measures using FastText sure that the groups were comparable in terms of age and gender, so embeddings, aims to build on this foundation to provide deeper that any differences observed in verbal fluency performance are less insights into the verbal fluency deficits in schizophrenia. likely to be confounded by these factors. Although there were differences in the average education level between the schizophrenia 74 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Analysis of Verbal Fluency in Slovenian Language in Patients With Schizophrenia and healthy control groups, we verified that within each education level, there were no significant differences between the groups, ensuring that education level did not confound the verbal fluency comparisons. 3.4 Data Analysis The collected data underwent various analyses to explore the differences in verbal fluency between individuals with schizophrenia and healthy participants. The following analytical techniques were employed: Figure 2: Top 5 Words by Individuals with Schizophrenia. (1) Statistical Analysis: A t-test was conducted to compare the total In addition to the graphical representation of the top words, number of words and the number of correct words produced Table 2 provides a summary of various key metrics from the verbal by the two groups, where correct words are those that are fluency tests. This includes the total number of different words, the neither intrusions, repetitions, nor neologisms. Before using number of unique words, and the counts of intrusion, repetition, the t-test, we checked that the data was normally distributed. and neologism words for both groups. This statistical test provided evidence of differences in verbal fluency performance between individuals with schizophrenia and healthy controls. Table 2: Comparison of Verbal Fluency Test performance (2) Semantic Similarity Analysis: For this study, we used FastText between individuals with schizophrenia (SP) and healthy embeddings to capture semantic relationships between words controls (HC). The bottom rows summarize the total number produced during the verbal fluency tasks. FastText is a word em-of different words across all users and overlapping words bedding technique designed to capture the semantic meaning between the groups. of words. It breaks words into character-level n-grams, which Total number of SP HC allows it to capture more contextual information and better Different words 176 247 handle rare or morphologically complex words. This makes Unique words 60 131 FastText particularly effective for languages like Slovenian, as Intrusion words 44 8 it can better represent linguistic nuances and provide more Repetition words 21 8 meaningful embeddings for semantic similarity analysis. Us-Neologism words 20 0 ing these embeddings, we calculated cosine similarity, mean semantic distance, local optimal divergence, and global opti-Different words across all users 307 mality divergence to capture the semantic relationships and Overlapping words between observed groups 116 coherence of word sequences. By combining these analytical approaches, our study aims to The analysis indicates significant differences in verbal fluency provide a comprehensive understanding of the verbal fluency im-between healthy individuals and those with schizophrenia. The pairments associated with schizophrenia, contributing valuable following sections will provide a detailed examination of the data, insights to cognitive functioning. including statistical analyses and further discussion on the implications of these findings. 4 RESULTS The results of the verbal fluency tests conducted on both individuals 4.1 Statistical Analysis with schizophrenia and healthy individuals are summarized in this The statistical analysis compared the total number of words and the chapter. Figure 1 and 2 display the most frequently spoken words number of correct words (no intrusion, no repetition, no neologism) by each group. Figure 1 presents the top five words spoken by produced by participants in both groups. healthy individuals, along with the occurrences of these words among people with schizophrenia. Similarly, Figure 2 illustrates the top five words spoken by individuals with schizophrenia, along Table 3: Mean and Standard Deviation (SD) of Words Pro-with the occurrences of these words in the healthy group. duced by Healthy Controls (HC) and Individuals with Schizophrenia (SP). Measure HC (Mean ± SD) SP (Mean ± SD) Correct Words 10.32 + 4.24 6.86 + 4.19 Total Words 10.69 + 4.32 8.52 + 5.12 The results of the t-test, after confirming that the data is normally distributed, are summarized in the table below: These results indicate significant differences between the groups, with healthy participants producing more correct and total words than participants with schizophrenia. Figure 1: Top 5 Words by Healthy People. 75 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Mila Marinković, Polona Rus Prelog, Martina Zakšek, and Jure Žabkar Table 4: T-test Results for Correct and Total Words by Group. 5.1 Conclusions Measure t-statistic p-value In conclusion, our study highlights the significant cognitive and Correct Words 4.77 < 0.001 linguistic impairments in individuals with schizophrenia, particu-Total Words 3.04 0.002 larly in verbal fluency performance. We matched participants on age and gender to ensure comparability between the groups. While 4.2 Semantic Similarity Analysis education is known to influence cognitive abilities, our analysis Using FastText embeddings, we calculated various semantic simi-confirmed that within each educational level, there were no signifi-larity measures. The average cosine similarity for the group with cant differences between the two groups. However, a larger sample schizophrenia was 0.2, while for the healthy group, it was 0.19. size is needed to increase the power of our statistical analyses and In addition to the average cosine similarity, we analyzed other allow for better generalizability and more in-depth exploration of measures which are summarized in Table 5. all demographic variables. Additionally, while FastText embeddings provided useful insights Table 5: Semantic Similarity Measures. into semantic coherence, they were not sensitive enough to capture Measure t-statistic p-value more subtle cognitive impairments. Future studies should explore al-Mean Semantic Distance -0.59 0.554 ternative methods to provide a more comprehensive understanding Global Optimality Divergence 2.75 0.007 of verbal fluency deficits in schizophrenia, contributing to improved Local Optimality Divergence 0.29 0.769 diagnostic and therapeutic strategies. Repetitions -2.26 0.026 Intrusions -1.83 0.070 REFERENCES Neologisms -4.47 <0.001 [1] Flavia Galaverna, Adrián M. Bueno, Carlos A. Morra, MarÃa Roca, and Teresa Torralva. 2016. Analysis of errors in verbal fluency tasks in patients with chronic schizophrenia. The European Journal of Psychiatry 30 (12 2016), 305 – 320. The differences in global optimality divergence and occurrences [2] Kyrsten Grimes, George Foussias, Gary Remington, Kathryn Kalahani-Bargis, and Konstantine Zakzanis. 2020. Stability of Verbal Fluency in Outpatients with of intrusions, repetitions, and neologisms were significant, high-Schizophrenia. Psychiatry Research 295 (10 2020), 113528. https://doi.org/10.1016/ lighting disruptions in the overall semantic coherence and increased j.psychres.2020.113528 errors in individuals with schizophrenia. [3] Mary Kosmidis, Vassilis Bozikas, Christina Vlahou, Grigoris Kiosseoglou, George Giaglis, and Athanasios Karavatos. 2005. Verbal fluency in institutionalized patients with schizophrenia: Age-related performance decline. Psychiatry research 5 DISCUSSION 134 (05 2005), 233–40. https://doi.org/10.1016/j.psychres.2005.02.003 [4] Nana Lehtinen, Ida Luotonen, and Anna Kautto. 2021. Systematic administration The results of this study highlight significant differences in verbal and analysis of verbal fluency tasks: Preliminary evidence for reliable exploration fluency performance between individuals with schizophrenia and of processes underlying task performance. Applied Neuropsychology: Adult 30 (09 2021), 1–13. https://doi.org/10.1080/23279095.2021.1973471 healthy controls. Healthy participants produced a higher number [5] Ojeda Natalia, Pedro Sanchez, Javier Peña, Edorta Elizagárate, Ana Yoller, Juan of correct words and exhibited more coherent and interconnected Larumbe, Miguel Gutiérrez-Fraile, Leonardo Casais, and Jesús Ezcurra. 2010. Ver-semantic structures compared to individuals with schizophrenia. bal Fluency in Schizophrenia Does Cognitive Performance Reflect the Same Underlying Mechanisms in Patients and Healthy Controls? The Journal of ner-These findings are consistent with established research on the cogni-vous and mental disease 198 (04 2010), 286–91. https://doi.org/10.1097/NMD. tive impairments linked to schizophrenia, particularly in the domain 0b013e3181d61748 of verbal fluency. [6] Angel Nevado, David Del Río, María Teresa Martín-Aragoneses, José M. Prados, and Ramón López-Higes. 2021. Preserved semantic categorical organization in In addition to the differences in correct word production, indi-mild cognitive impairment: A network analysis of verbal fluency. Neuropsychologia viduals with schizophrenia also showed a significantly higher fre-157 (2021), 107875. https://doi.org/10.1016/j.neuropsychologia.2021.107875 [7] Dália Nogueira, Elizabeth Reis, and Ana Vieira. 2016. Verbal Fluency Tasks: quency of errors, including repetitions, neologisms, and intrusions. Effects of Age, Gender, and Education. Folia Phoniatrica et Logopaedica 68 (12 These errors are characteristic of the cognitive disorganization 2016), 124–133. https://doi.org/10.1159/000450640 associated with schizophrenia and reflect the impaired executive [8] Matthew M. Nour, Daniel C. McNamee, Yunzhe Liu, and Raymond J. Dolan. 2023. Trajectories through semantic spaces in schizophrenia and the relationship to function and semantic processing commonly observed in the dis-ripple bursts. Proceedings of the National Academy of Sciences 120, 42 (2023), order. The increased number of neologisms and intrusions further e2305290120. https://doi.org/10.1073/pnas.2305290120 highlights the semantic and linguistic disruptions that differentiate individuals with schizophrenia from healthy controls. Furthermore, although the use of FastText embeddings was a key aspect of this study, the method was not sensitive enough to capture local semantic disruptions. Measures such as cosine similarity and mean semantic distance, focusing on local semantic relationships, failed to highlight significant differences between the groups. This outcome suggests that while local word relationships may remain relatively preserved in individuals with schizophrenia, or that FastText embeddings may not effectively capture subtle local disruptions, the broader semantic coherence was impacted. This was evidenced by the significant differences in global optimality divergence, demonstrating a marked reduction in overall word sequence coherence among individuals with schizophrenia. 76 Proc. of the 10th Student Computing Research Symposium (SCORES’24), Maribor, Slovenia, October 3, 2024 Index of Authors Bedrač, Žan, 33 Natev, Andrej, 57 Bele, Simon, 7 Nikov, Mitko, 33 Bubnic, Bostjan, 49 Bubnič, Gal, 49 Paliska, Dejan, 37 Bóta, András, 61 Pavlović, Jovan, 61 Piskar, Sašo, 15 Dolanc, Domen, 15 Plankelj, Marko, 41 Poljanšek, Tomaž, 3 Filipič, Bogdan, 2 Pur, Aleksander, 11 Fister, Iztok, 1, 78 Pustoslemšek, Jure, 19 Georgiev, Dejan, 65 Repič, Tjaša, 15 Grošelj, Ema Leila, 3, 45 Rihter, Nejc, 19 Hajdu, László, 61 Rogelj, Peter, 69 Roj, Lea, 11 Jeromel, Aljaž, 15 Rus Prelog, Polona, 73 Kohek, Štefan, 11, 78 Vovk, Klemen, 45 Kosar, Tomaž, 49 Krész, Miklós, 61 Zadravec, Hana, 53 Zakšek, Martina, 73 Lukač, Niko, 11, 15, 78 Zupanič, Matjaž, 65 Marinković, Mila, 73 Črne, Ema, 19 Micev, Riste, 69 Milosavljević, Vera, 37 Šinko, Matija, 23 Mlakar, Uroš, 41 Šprajc, Žan Tomaž, 33 Molan, Nika, 45 Žabkar, Jure, 7, 65, 73 Proceedings of the 10th Student Computing Research Symposium (SCORES’24) Niko Lukač (ed.) Iztok Fister (ed.) Štefan Kohek (ed.) niko.lukac@um.si iztok.fister@um.si stefan.kohek@um.si University of Maribor, University of Maribor, University of Maribor, Faculty of Electrical Faculty of Electrical Faculty of Electrical Engineering and Computer Science Engineering and Computer Science Engineering and Computer Science Maribor, Slovenia Maribor, Slovenia Maribor, Slovenia ABSTRACT The 2024 Student Computing Research Symposium (SCORES 2024), organized by the Faculty of Electrical Engineering and Computer Science at the University of Maribor (UM FERI) in collaboration with the University of Ljubljana and the University of Primorska, showcases innovative student research in computer science. This year’s symposium highlights advancements in fields such as artificial intelligence, data science, machine learning algorithms, computational problem-solving, and healthcare data analysis. The primary goal of SCORES 2024 is to provide a platform for students to present their research, fostering early engagement in academic inquiry. Beyond research presentations, the symposium seeks to create an environment where students from different institutions can meet, exchange ideas, and build lasting connections. It aims to cultivate friendships and future research collaborations among emerging scholars. Additionally, the conference offers an opportunity for students to interact with senior researchers from institutions beyond their own, promoting mentorship and broader academic networking. KEYWORDS student conference, computer and information science, artificial intelligence, data science, data mining ISBN 978-961-286-914-4 DOI: https://doi.org/10.18690/um.feri.6.2024 Document Outline Preamble Cover Conference Program Plenary Speakers SCORES’24: History, mission and visionIztok Fister Evolutionary Computation: Overview, Trends and PerspectivesBogdan Filipič Section 1: Advances in Graph Theory and Algorithmic Solutions Chairman: Štefan Kohek Influence of Graph Characteristics on Solutions of Feedback Arc Set ProblemEma Leila Grošelj, Tomaž Poljanšek Learning Multi-Level Skill Hierarchies with GraphwaveSimon Bele, Jure Žabkar Integration of Named Entity Extraction Based on Deep Learning for Neo4j Graph DatabaseLea Roj, Štefan Kohek, Aleksander Pur, Niko Lukač Efficient Implementation of Spreadsheet User ApplicationTjaša Repič, Aljaž Jeromel, Sašo Piskar, Domen Dolanc, Niko Lukač Improvement and Evaluation of a Heuristic Method for the Minimal Feedback Arc Set ProblemJure Pustoslemšek, Ema Črne, Nejc Rihter Section 2: Image Processing, Computer Vision, and NLP Applications Chairman: Grega Vrbančič Counter-Strike Character Object Detection via Dataset GenerationMatija Šinko Cross-Lingual False Friend Classification via LLM-based Vector Embedding AnalysisMitko Nikov, Žan Tomaž Šprajc, Žan Bedrač Analyzing Tourist Destinations in Belgrade using Geotagged Photos from FlickrVera Milosavljević, Dejan Paliska Volleyball Game Analysis Using Computer Vision AlgorithmsMarko Plankelj, Uroš Mlakar Section 3: Machine Learning and Data Analytics in Various Domains Chairman: Marko Bizjak A Bayesian Approach to Modeling GPS Errors for Comparing Forensic EvidenceNika Molan, Ema Leila Grošelj, Klemen Vovk Seven Components of Computational Thinking: Assessing the Quality of Dr. Scratch Metrics Using 230,000 Scratch ProjectsGal Bubnič, Tomaž Kosar, Bostjan Bubnic Machine Learning Approaches to Forecasting the Winner of the 2024 NBA ChampionshipHana Zadravec Hit Song Prediction Through Machine Learning and Spotify DataAndrej Natev A Data-Driven Approach for the Analysis of Ridership Fluctuations in Transit SystemsJovan Pavlović, Miklós Krész, László Hajdu, András Bóta Section 4: Machine Learning Applications in Neuroscience and Healthcare Chairman: Uroš Mlakar Automatic Assessment of Bradykinesia in Parkinson’s Disease Using Tapping VideosMatjaž Zupanič, Dejan Georgiev, Jure Žabkar Exploring Mathematical Decision-Making Through EEG AnalysisRiste Micev, Peter Rogelj Analysis of Verbal Fluency in Slovenian Language in Patients With SchizophreniaMila Marinković, Polona Rus Prelog, Martina Zakšek, Jure Žabkar Index of Authors