https://doi.org/10.31449/inf.v48i15.6261 Informatica 48 (2024) 49–64 49 Virtual Simulation of Dance by Integrating VR Technology and Motion Capture Technology Ran Tao College of Music and Dance, Huaqiao University, Xiamen 361021, China E-mail: wudaotaoran@163.com Keywords: virtual reality technology, three-dimensional bone extraction, motion capture, virtual simulation, graph convolutional neural network Received: May 24, 2024 For the virtual simulation technology is difficult to meet the needs of dance movement interactivity, multi-person collaboration and real-time, a virtual dance simulation method combining virtual reality technology and motion capture technology is proposed. By using motion capture technology to capture dancers' movements and transform them into 3D movements of the virtual environment. Real-time feedback and interaction are then used to provide effective learning and performance tools for YH. The model was trained using Human3.6M dataset and obtained 81% accuracy on the self-built dataset. The validation accuracy on NTU RGB+D dataset and Kinetics dataset were 80% and 55% respectively. The accuracy of the proposed method of the study increased by 14.89%, 7.99% and 19.34% respectively compared to other methods. The accuracy of the study proposed dance virtual simulation method was 92% for all movement tests. The results show that the dance virtual simulation method designed by the study based on the fusion of virtual simulation technology and motion capture technology can achieve the interactivity of the limbs and improve the accuracy of the recognition of dance movements, which has a positive application value in the field of virtual dance experience. This study is expected to promote the further development of virtual dance experience and provide more opportunities and innovative space for dance enthusiasts and professional dancers. The study shows that this research is expected to promote the further development of virtual dance experience, providing more opportunities and innovation space for dance lovers and professional dancers. Povzetek: Predlagana je izvirna metoda virtualne simulacije plesa, ki združuje tehnologijo virtualne resničnosti in zajem gibanja za izboljšanje interaktivnosti in prepoznavanja plesnih gibov. Prispevek omogoča realistično simulacijo in spodbuja inovacije v plesu ter izboljšuje učne in izvedbene izkušnje plesalcev. 1 Introduction As an art form with a long history and profound cultural connotation, dance has been attracting many lovers and audiences with its unique artistic charm. However, traditional dance teaching and performance modes are often limited by venue, time and other factors, making it difficult to achieve efficient learning and creation [1- 3]. At the same time, the development of dance art also needs continuous innovation and breakthrough to meet the needs and aesthetics of modern society. In this context, Virtual dance simulation, which integrates Virtual Reality (VR) and motion capture technology, has become a solution that attracts much attention. By introducing virtual reality technology into the field of dance, dancers are expected to explore freely in the virtual environment to achieve a higher degree of immersive learning and performance experience. At the same time, the application of motion capture technology also enables every minute movement of dancers to be accurately recorded, providing powerful data support for the improvement of skills [4-5]. Based on this, a virtual dance simulation method combining VR technology and motion capture technology is proposed. First, the dancers wear sensor gear to capture their body movements in real time and translate those movements into digital information. Through VR technology, users can experience simulated dance in a virtual environment, adjust various parameters of the dance, and get real-time feedback to improve their skills. The research aims to provide more ways to promote the development of dance innovation, enhance the virtual dance experience through interaction, and bring new possibilities to the field of dance. The innovation of the research is to integrate VR technology and motion capture technology to further improve the real- time performance of virtual simulation. This method is expected to improve the interactivity and realism of virtual dance, and provide a new way for dance innovation. The study was divided into five parts. The first part introduces the research background, problems and solutions of dance virtual simulation. The second part 50 Informatica 48 (2024) 49–64 R. Tao summarizes the research results of dance virtual simulation, and summarizes the difficulties and shortcomings of the methods. The third part introduces the dance virtual simulation method combining VR technology and motion capture technology. In the fourth part, a comparative experiment is designed to verify the accuracy and real-time performance of the proposed method by comparing it with Du's. The fifth part summarizes the research methods, analyzes the experimental results, and puts forward the shortcomings and prospects of the methods. 2 Related works In recent years, virtual simulation experimentation techniques have developed rapidly and have had a significant impact on several fields. These technologies mainly use computer simulation and virtual reality to create near-reality experiences and environments. In order to realize the design of recommender system for shopping information search without requiring too much effort from users, Pfeiffer et al. proposed to utilize support vector machine combined with virtual reality technology to classify users' shopping search motives, so that shopping motives are identified in advance during the search process [6]. Wei et al. proposed a teaching platform for electrical engineering courses based on a virtual simulation platform, aiming to solve the limitations brought by the traditional way of teaching. Experiments show that the platform further improves students' comprehension and hands-on ability, which further improves the quality of teaching [7]. Li and other scholars proposed a virtual simulation experimental platform for chemistry experiments, aiming to further improve the teaching efficiency, while reducing the risks associated with experimental learning. Experiments show that this method can significantly improve students' innovative skills and the safety of experiments [8]. Hu and other researchers proposed a virtual simulation-based distributed wind power generation experimental system, aiming to further improve students' ability to deal with complex engineering problems. Experiments showed that the system can effectively improve students' ability to solve complex engineering [9]. On the other hand, with the development of VR technology, it has been widely used in many fields such as education and training, design and construction, military and prevention, etc. The combination of VR technology and motion capture technology can customise virtual environments according to the user's actions and reactions, providing a more realistic experience. To determine the feasibility of a commercially available virtual reality gaming system for upper extremity rehabilitation of community-based stroke survivors, Warland et al. conducted a study of postoperative rehabilitation training for stroke patients using a virtual reality system. Data from rehabilitation training movements of stroke patients were captured and analyzed, leading to the discovery of new ways to improve impairment and increase spontaneous use of functional activities [10]. Maciejewski and other researchers proposed a shooting tracking method with VR technology, aiming to improve the accuracy of military shooting training. Experiments show that the method has high selected attributes [11]. Yang and other researchers proposed a deep learning-based body pose recognition method for VR technology, aiming to capture human body movement data. Experiments showed that the method was able to effectively record various body postures [12]. Qiu et al. proposed a lighter and cheaper wireless inertial line motion capture system in order to improve the localization accuracy of human motion capture system. The human sensor network is used as the basis for foot displacement calculation using the zero-speed update algorithm, and the pose and foot trajectory fusion is performed according to the root unconstrained traversal, thus realizing the simultaneous reconstruction of human pose and displacement [13]. In summary, virtual simulation is used in fields such as medicine, aviation, and the military to provide lower- risk hands-on training environments that help learners gain experience without endangering the real world. However, it still suffers from low real-time performance and lack of experimental data. Therefore, this research proposes a dance virtual simulation that integrates VR technology and motion capture technology in conjunction with motion capture technology, which is expected to improve the problems such as low accuracy of VR technology through motion capture technology. In addition, the study further elaborates the comparison of the proposed method with the existing methods as shown in Table 1. Table 1: Comparison of different methods Authors Year of publication Methodologies Key results and findings Limitations Zhao et al. [6] 2020 3DS MAX modelling, animation software and Unreal Engine 4 game engine The platform has a relatively high focus on testing and has high testing accuracy and security Depth image acquisition equipment is not easy to carry Wei et al. [7] 2023 Teaching platform for electrical engineering course based on virtual simulation platform Addresses the limitations of traditional teaching methods on students' understanding of the frontiers of science Less interactive and real-time Virtual Simulation of Dance by Integrating VR Technology and… Informatica 48 (2024) 49–64 51 Li et al. [8] 2022 Virtual simulation lab platform for chemistry experiments Improved teaching efficiency Poor interactivity and real-time performance Hu et al. [9] 2021 Virtual simulation-based experimental system for distributed wind power generation Virtual simulation-based experimental system for distributed wind power generation Improved students' ability to cope with complex engineering problems Poor interactivity Braun et al. [10] 2022 Motion capture method for firefighter training based on VR technology Effectively improved firefighter training Occlusion of human motion capture Maciejewski et al. [11] 2020 VR technology shot tracking methods Achieved higher selected attributes Less real-time Yang et al. [12] 2021 Deep neural network- based body pose recognition method for VR technology Improved field tracking efficiency Poor interactivity Foreman et al. [13] 2019 A training model for Parkinson's patients based on VR technology Effectively improves the training effect of patients Poor interactivity This paper - Virtual simulation of dance based on graph convolutional modelling and VR Effectively improves the interactivity of limbs and the recognition accuracy of dance movements - 3 Research on virtual dance simulation method based on VR technology The dance virtual simulation method first requires capturing the dancer's movement data and then processing it into digitised information. Next, these data are used to create a virtual dance scene. Subsequently, the captured movements are transformed into a virtual performance. Finally, users can interact with the virtual dance through their devices and receive feedback in real time. This approach provides a more immersive dance experience. 3.1 3D bone extraction and human motion capture method Traditional 3D skeleton extraction algorithms suffer from high computational cost, sensitivity to lighting, viewpoint changes and occlusion, and lack of accuracy and stability in complex scenes [14-15]. Unlike traditional methods, Openpose adopts a bottom-up approach, which first detects the locations of human joints and then connects these nodes to construct the human skeleton [16]. In addition, Openpose focuses more on the detection of human joint points, and therefore can better handle problems such as self-obscuration. Figure 1 shows the workflow diagram of VGG to receive images and divide the feature graph into two branches. 52 Informatica 48 (2024) 49–64 R. Tao P pooling layer C convolution layer Video picture input 3×3 C 3×3 C 3×3 C 3×3 C 3×3 C 3×3 C 2×2 P 2×2 P 2×2 P 3×3 C 3×3 C 3×3 C 3×3 C 3×3 C 1×1 C 1×1 C LOSS 3×3 C 3×3 C 3×3 C 1×1 C 1×1 C LOSS 3×3 C 3×3 C 3×3 C 1×1 C 1×1 C LOSS LOSS 3×3 C 3×3 C 3×3 C 1×1 C 1×1 C Branch one Stage one Stage two Branch two Branch one Branch two Figure 1: The VGG receives the picture and divides the feature map into two branches. In Figure 1, the network model is divided into multiple stages, with the input to each stage being the output of the previous stage combined with the original feature map. The output of each stage consists of a set of positional confidence maps and a set of 2D vector fields. The network is optimised by computing a loss function that gradually learns to distinguish the left and right of the structure as the number of iterations increases. Eventually, the joint point coordinates are corresponded to each individual's body through the part affinity field. The calculation of the true value of the positional confidence map can be referred to Eq. (1). 2 ,2 * , 2 || || ( ) exp jk jk px SP   − =−    (1) In Eq. (1), 2 PR  , denotes the current predicted picture, which is further expressed as the location of the j site of the k individual, and  is a constant. The computation of the truth value of the vector field of L is shown in Eq. (2). * , If A falls on the k person's c link () 0 other ck v LA  =   (2) In Eq. (2), A denotes the pixel point, when A is on the c th link of the k th individual, then the unit vector between the two keypoints on this link is denoted by () LA, otherwise it is taken as 0, which means that this pixel point is not on the human torso. The formula for vector divided by modulus length to find the unit vector is shown in Eq. (3). 2 1 2 1 , , , , 2 ( )/ || || j k j k j k j k v x x x x = − − (3) In Eq. (3), v denotes the corresponding unit direction vector of the torso where the pixel point is located. This study utilised the 2D human pose estimation results to obtain 3D human skeleton information. The 2D human skeleton information was rapidly extracted by the Openpose tool and then matched to the pre-trained 3D model to achieve the conversion from 2D to 3D [17-18]. In the study, the Openpose open-source library was used to process the frame images of general video intercepts, and a version of GPU computing based on Linux system was chosen to complete the extraction of the 2D human skeleton. Figure 2 shows the numbering of the human joint points and the COCO-18 human skeleton. Virtual Simulation of Dance by Integrating VR Technology and… Informatica 48 (2024) 49–64 53 1 0 14 15 16 17 2 5 3 6 4 8 11 9 12 13 10 7 Joint 13 Left heel Left wrist 7 Chest Nose 6 Left elbow 12 Right heel n M m A m E 14 Oculus dexter Right hipbone 8 Right shoulder n M m A m E 15 Oculus sinister Right knee 9 1 0 n 2 n ID Joint ID Joint ID Right elbow n M m A m E 16 Right ear Right heel 10 Right wrist n M m A m E 17 Left ear Left hipbone 11 n 4 n Left shoulder 3 5 (a) Human body node serial number corresponding table (b) COCO-18 skeleton diagram Figure 2: The number of the human node and the figure of the COCO-18 human skeleton. In Figure 2, there are three selection states available for human keypoints, namely 15, 18 and 25. through experiments, it is found that selecting 18 keypoints works best. Openpose is used to obtain an ordered set of 2D joint point data from general video frames. By pairing the 3D/2D pose datasets, each 2D pose information is converted from the trained "virtual camera" to the original 3D pose set. Therefore, any 2D skeleton 2 ( , ) d p x y    = that needs to match 3D skeleton information can be described by Eq. (4). 3 _ 2 3 _ 2 2 || ( | ) exp i D i D D i D W P p p p p   −   −   (4) In Eq. (4), 3D P denotes the 3D pose library and i W denotes the corresponding virtual camera transform matrix. Since the algorithm is mainly used for action recognition, the skeleton needs to be extracted from consecutive frames containing the action video stream. This reflects the fact that over a period of time, the human body pose does not change significantly from frame to frame, since the human body moves continuously. Therefore, the human motion poses in any two consecutive frames over a period of time satisfy the relationship described in Eq. (5). 16 2 2 2 1 ( ) ( ) ( ) B< m n m n m n i B p x p x p y p y p z p z  = = − + − + −  (5) In Eq. (5), B denotes the Euclidean distance between individual joints in different two 3D poses;  is a scalar parameter, and , mn denotes the similarity in the video frame sequence. For the improved matching algorithm, the first frame is required to use global matching, where Euclidean distances are computed across the entire 3D pose library to find the minimum value. Instead of using global matching for subsequent frames, matching is performed in the λ sample data before and after, starting from the position matched in the previous frame. This is done because the sample poses of the 3D pose library conform to the principle of continuity and localisation of human motion, as detailed in Eq. (6). min( ) nin i BB = (6) The i in Eq. (6) is denoted as the frame sequence number in the video stream. When processing graph data, node features and structural information need to be considered together. Traditional manual rules are difficult to capture complex patterns, so graph convolutional neural network, a method for deep learning, is introduced, in which the convolution operation is shown in Eq. (7). 1 1 () j a a l l j j R jN aj h h w c  +  =  (7) In Eq. (7), l denotes the layer in the neural network; a denotes the node in the graph structure; aj c denotes the normalisation factor, which is the reciprocal of the node degree; a N denotes the neighbouring nodes of the node a ; j R denotes the type of the node a , and w denotes the weight parameter. The matching algorithm usually obtains consecutive frames over a period of time, containing information about the human 3D skeleton at different points in time, with multiple joint points in each frame. In order to capture the overall skeleton motion characteristics, the skeleton spatio-temporal graph, which is an undirected graph, is constructed, see Figure 3. 54 Informatica 48 (2024) 49–64 R. Tao Temporal Figure 3: Space-time diagram of human skeleton. G in Figure 3 contains T frames with N joints per frame, and the joints are connected by skeleton edges and trajectory edges. Unlike 2D or 3D convolutional neural networks, graph convolution has some differences in implementation. In spatio-temporal convolutional graph, the spatial graph within a single frame is represented by the adjacency matrix  and the self-connections are represented by the unit matrix I. The graph convolution of the spatial map in a single frame can be represented by Eq. (8). 11 22 ( ) . out in f A I f W −−  +  (8) In Eq. (8), in f is an input feature graph, represented by the tensor ( , , ) C V T , and the convolution process performs a standard two-dimensional convolution followed by multiplication by the normalised adjacency matrix. Throughout the graph convolution, the receptive field size and root node distance are set to 1, similar to CNN image convolution. The study increases the size of the receptive field, i.e., the value of D is set to 2 to increase the number of points sets adjacent to the root node, and the change in the receptive field is shown in Figure 4. (a) Subsets with adjacent distance of 2 (b) Comprehensive distance and center of gravity distance division strategy Figure 4: Subsets with adjacency distance as 2 and division strategies of comprehensive distance and centre of gravity distance. In Figure 4, to adjust the size of the convolution kernel, the study focuses on the value of K in the spatial domain dimension. The neighbouring point set division strategy is redesigned, combining a new approach of distance and motion spatial configuration. Firstly, based on the distance from a node to the root node, the surrounding nodes are divided into three parts, the root node itself, the set of points with distance 1, and the set of points with distance 2. These sets are then divided into nodes close to the centre of the overall skeleton and nodes Virtual Simulation of Dance by Integrating VR Technology and… Informatica 48 (2024) 49–64 55 far from the centre based on the spatial configuration, resulting in the division of the sensory field into five subsets. 3.2 Research on virtual simulation of human motion capture dance Based on 3D skeleton extraction and skeleton movement classification techniques, a VR human- computer interaction system using body movements as operation commands is designed and applied to virtual reality projects. The user and processing terminal is responsible for ensuring the normal operation of the programme, processing the operation commands and directing the display of the head-mounted VR device. A Cardboard eyeglass case is used as the head-mounted terminal [19]. The Data Acquisition and Processing Terminal shoots videos of limb movements, collects data, analyses and extracts the human skeleton for movement classification. The module understands body semantics, converts semantics into interactive instructions, and sends them to users and processing terminals. The whole interaction process is shown in Figure 5. Start The web program starts running Video stream 0penpose Extract 2D skeleton Whether to start action recognition Whether it is a specific human action Probe orbit simulation display Convolutional action recognition based on graph Send interactive instructions to complete man- machine interaction Whether to close program End Extracting 3D skeleton based on matching algorithm N Y N Y N Y Figure 5: Flow chart of VR interactive system. In Figure 5, the whole VR programme is first launched and developed using the Cardboard developer SDK built with the Unity virtual engine, which is eventually packaged and installed with Unity on the mobile phone. Then, the mobile phone is placed into the glasses case, the camera is activated, and the 2D skeleton of the human body is extracted in real time through Openpose, and video recording is initiated when the amplitude of movement reaches a specific threshold [20]. The 2D skeleton is converted to 3D skeleton by matching algorithm and inputted into action recognition model for classification. The classification results are mapped into interaction commands and fed back to the VR simulation programme. By enriching the action semantics, the operation commands can be extended to realise newer human-computer interaction modes, as shown in Figure 6. Movement Semantics Conversion rule Instruction operation Off Unpack Blow up Shear One-to-one correspondence On one's feet Wave Kick Walk Figure 6: Relationship between body semantics and operational instructions. 56 Informatica 48 (2024) 49–64 R. Tao In Figure 6, the study chooses to use the existing categorised actions, filtered to correspond one-to-one with the operation commands. This information is sent to the VR display module through the action recognition module, and after discrimination the delegate function is called to realise functions such as conversion of viewpoints, zooming in and zooming out, which are implemented on the Unity platform. For the purpose of the study, a large number of dance action models need to be built for training, and Eq. (9) shows the library of input samples. 1,1 1,2 1,3 , { , , , , ,} cn X g g g g = (9) In Eq. (9), , cn g denotes the action fragment at the n th gesture, while the fragment contains the action gesture at m . The expression for the calculation of the action fragment at the m th gesture , cn g is shown in Eq. (10). , 1 2 3 { , , , , ,} c n m g P P P P = (10) After a complete dance movement segment is projected to the output space, a "trajectory" will be formed in the output space, and a set of index numbers containing timing information will be obtained. The specific calculation formula is shown in Eq. (11). , ( ) ( }, c n t O t o t T = (11) In Eq. (11), , cn O is used to identify the index sequence of dance moves for each category. According to the histogram statistics rule, a histogram of a sequence of movements containing n gestures can be represented by Eq. (12). , () u u c n f H o L n = (12) In Eq. (12), f denotes the frequency of occurrence of the u th output node in the dance action, and n denotes the number of gestures included in the dance. The new input movements are computed by matching the Euclidean distance with the known movement templates to discriminate the category of unknown movements. In a dance self-learning system, this recognition process can be done offline or online. Action variability is calculated by the normalised inner product of the eigenvectors of the two gestures, as described in Eq. (13). ,, 1 ( , ) ( (max) (min) N i k j k i j k k kk ff d p p w ff = − = −  (13) In Eq. (13), , ik f and , jk f are the values of the k th feature vector for the postures i p and j p respectively; (max) k f represents the maximum value of the k th feature; (min) k f represents the minimum value of the k th feature, and k w is the weight of the k th feature. In the study, the joint angle Euclidean distance is used to evaluate the accuracy of the simulated movement, and the specific calculation formula is shown in Eq. (14). 1 ( , ) n ii i avg D R C E n   = =  (14) In Eq. (14), avg E denotes the average error, and n denotes the total number of frames describing this action. i R  denotes the actual value of the joint; i C  denotes the simulated value of the dummy, and ( , ) ii D R C  denotes the Euclidean distance between the actual value and the simulated value of the data in the 1 i  = th frame. The common frame is the basis of the algorithm and is used to ensure that the two action segments before and after the action choreography have similar poses for a smooth transition. It is understood that for any two poses of data, the centre of gravity distance between them can be measured using Eq. (15). 2 2 2 ( , ) ( ) ( ) ( ) R i i i j i j i j D f f x x y y z z = − + − + − (15) In Eq. (15), ( , , ) x y z denotes the coordinates of the centre of gravity of the human body, respectively. In summary, the whole process of skeleton-based action recognition and graph convolution-based human action recognition method mainly starts from data acquisition, goes through motion capture, data processing, feature extraction, action recognition, virtual reality environment setting, and finally realises the user's interaction through the VR device, and improves the skills and experience according to the real-time feedback. The details are shown in Figure 7. Virtual Simulation of Dance by Integrating VR Technology and… Informatica 48 (2024) 49–64 57 Start Data acquisition Using motion capture devices Recording dancers' movements 2D human skeleton data Handling with Openpose Extracting 2D joint data Conversion to 3D skeleton data Matching to pre- trained 3D models Complete 2D to 3D conversion Constructing a spatial and temporal map of the skeleton Applying graph convolutional neural networks Extraction of spatial and temporal characteristics Action recognition and classification Generating interactive commands Virtual Reality Environment Setup Developing VR apps with Unity 3D and Steam V Integrated motion recognition system User interaction through VR devices Real-time feedback and adjustments to improve skills and experience End Figure 7: Implementation process of dance motion capture technology based on virtual technology. In Figure 7, the study uses the Human3.6M dataset to pre-train the 3D model and conducts simulation experiments of the proposed method with the self- constructed dataset and the NTU RGB+D dataset and the Kinetics dataset.Human3.6M is a large-scale multiview human motion capture dataset, which provides detailed information about the human body's joint positions, and is often used for the research of 3D human motion analysis NTU RGB+D is a multimodal dataset that provides depth information and colour images and is commonly used for 3D action recognition and understanding research. This dataset records a wide range of human actions and provides detailed annotation information. Kinetics is a large-scale video dataset commonly used for action recognition and understanding research. It contains multiple categories of action videos, each labelled with an action category. The above dataset is able to accurately verify the effectiveness of the proposed method of the study, so the subsequent simulation experiments are conducted around the above dataset. In addition, based on the obtained data, the study uses Openpose to extract 2D skeleton information from video frames and uses a matching algorithm to convert the data into 3D skeleton data. Meanwhile, the data are fused and preprocessed by data overlay, feature fusion, and spatial alignment. 4 Virtual technology-based dance motion capture technology validation simulation experiments The study was conducted using an Intel Chihuahua E5- 2600 CPU with a software configuration that included CUDA 8.0, CUDNN 6.0, Python 3.5, and Tensorflow 1.4. The object of the study was the NTU RGB+D dataset containing action samples, each consisting of an RGB video, a depth map sequence, 3D skeletal data, and an infrared video. The 3D skeletal data consisted of each frame of the 25 3D positions of major body joints. The dataset covers a total of 60 different action categories, which are divided into three main categories: everyday actions, interactive actions and complex actions. 4.1 Simulation experiment of dance movement recognition based on virtual technology The Human3.6M dataset, which contains about 3.6 million labelled human movement data samples and their corresponding RGB images, is used as the experimental training data in the study. The whole dataset can be divided into 11 broad categories, each consisting of 15 subcategories. These major classes mainly represent 11 different professional modelling experimenters, while the subclasses mainly cover different human movements. Figure 8 shows the results obtained by three different models in the Human3.6M dataset using the mean error as an evaluation metric. 58 Informatica 48 (2024) 49–64 R. Tao 0 50 100 200 Euler Angle Move B A C D F E (a) Action Group M identifies the result 150 LinKDE's approach Du's approach Our's approach 0 50 100 200 Euler Angle Move b a c d f e (b) Action Group N identifies the result 150 LinKDE's approach Du's approach Our's approach Figure 8: Comparison of identification errors of different methods on Human3.6M dataset. According to Figure 8(a), it can be seen that the error of the proposed method is minimised when compared to other proposed methods such as Du's, except for A. The errors of the proposed method are 118, 95, 96, 137, and 112 on B, C, D, E, and F, respectively. according to Figure 8(b), the error of the proposed method is minimised when compared to other proposed methods such as Du's, except for d. The errors of the proposed method are 99, 120, 115, 117, and 112 on a, b, c, e, and f, respectively. the study is based on the method of the proposed method. 's et al. The method has the smallest error compared to all other methods proposed in the study, which are 99, 120, 115, 117, and 112 on a, b, c, e, and f, respectively. The study evaluates the effect of different sensory field delineation strategies proposed in the study on the recognition results by using the original dataset as a test set and the accuracy as an evaluating metric, and Figure 9 demonstrates the corresponding test results. 0 20 40 60 80 100 Accuracy rate% Partition strategy Unique partition Distance division 58.3 NTU RGB+D Kinetics 0 20 40 60 80 100 Accuracy rate% Partition strategy Spatial configuration partitioning Comprehensive partition 52.8 NTU RGB+D Kinetics 75.3 37.4 69.7 44.9 51.3 79.6 (a) Result 1 (b) Result 2 Figure 9: Results display of different partitioning strategies. The experimental results shown in Figure 8 indicate that the graph convolution model performs differently with different subset division methods. The poor model performance when a single division method is used is mainly due to the fact that only a simple averaging of features is performed before the graph convolution process. The performance of the model progressively improves as the number of subset divisions increases, which highlights the importance of the size of the receptive field and the size of the convolution kernel. The improved model achieves a small increase in performance by enlarging the receptive field and introducing a new subset division strategy. To evaluate the effectiveness of this model, the researchers compared it to several other deep learning models for action recognition, including 3D convolutional neural networks, dual-stream convolutional neural networks and dual-stream recurrent neural networks. These experiments covered the new dataset, the NTU RGB+D dataset and the Kinetics dataset, and mainly evaluated the F1 values and accuracy of the model in the three datasets, and the specific results can be seen in Table 2. Virtual Simulation of Dance by Integrating VR Technology and… Informatica 48 (2024) 49–64 59 Table 2: Comparative experimental results display. Data set Make by oneself Kinetics NTU RGB+D F1 score (%) Accuracy (%) F1 score (%) Accuracy (%) F1 score (%) Accuracy (%) Textual convolution model 82 81 54 55 81 80 3D product model 50 57 55 56 73 75 Two-stream recursive convolution model 59 62 54 57 78 81 Two-flow convolution model 46 45 55 57 76 79 According to Table 2, the proposed method has the highest accuracy rate of 81 % in the dataset ‘Make by oneself’. In the other two datasets, the accuracy rates are 55 % and 80 %. Compared with the other methods, the accuracy of the proposed method increased by 14.89 %, 7.99 %, and 19.34 %, respectively. In terms of F1 score, the proposed method is more advantageous in the ‘Make by oneself’ dataset and ‘NTU RGB+D’ dataset, while in the ‘Kinetics’ dataset, the 3D convolutional neural network F1 score is more advantageous in the ‘Kinetics’ dataset and the ‘Kinetics’ dataset. In the ‘Kinetics’ dataset, the 3D convolutional neural network F1 score is pseudo 55%, while the F1 score of the proposed method is 54%. This difference may be due to the fact that Kinetics is a large-scale video dataset, and 3D convolutional neural networks are better at capturing spatio-temporal features in videos. Overall, graph convolution as a messaging method acting on graph data, human keypoints can be used to build graph data based on human limb connections, while features that contain more information about human structure can be extracted by graph convolution. This result indicates that the graph convolution model performs well in real- time and can effectively combine temporal and spatial features. The model better captures the trajectory of the human skeleton at different moments by constructing a spatio-temporal graph based on the human skeleton sequence. 4.2 Motion capture based virtual simulation experiment for dance In order to evaluate the reliability and feasibility of the interactive system, two key indexes of accuracy and real- time are considered in the experiment. In the laboratory environment, the action data of the test participants were collected, and the response accuracy rate of the VR interactive system based on action recognition under different action commands was recorded, so as to verify the effectiveness of the interaction process. At the same time, the response speed of the system is evaluated by measuring the response time difference between the proposed interaction method and the traditional viewpoint gaze interaction method. The study performed 100 interaction tests for each common action, and recorded the number of successes and the average interaction time, as shown in Figure 10. 0 20 40 60 80 100 120 Move 1 Move 2 Move 3 Move 4 Number successes per unit Move Number of successes Number of experiments 0 20 40 60 80 100 120 Move 1 Move 2 Move 3 Move 4 NUMBER SUCCESSES PER UNIT MOVE Accuray Number of experiments Figure 10: Experimental results of action recognition accuracy. According to Figure 10, the accuracy rates of all movement tests exceeded 70%, with the highest being 92% and the lowest being 76%, confirming the feasibility of body movements in VR interaction. However, there are significant differences in the accuracy rates of different actions. Actions with a large amplitude and left-right direction extension were recognised better, while those with a small amplitude and front-back direction extension were recognised poorly. When switching between actions, too much speed and frequency can lead to misoperation or reduce the accuracy of the operation. Considering the importance of real-time to interactive experience and efficiency, the average VR interaction time based on body movements is compared with the traditional viewpoint gaze interaction time, and the relevant results are shown in Figure 11. 60 Informatica 48 (2024) 49–64 R. Tao 0 0.5 1 1.5 2 2.5 3 3.5 Blow up Minification Time/s Interactive mode Viewpoint gaze Action recognition 0 0.5 1 1.5 2 2.5 3 Shift viewpoint Perspective shift Time/s Interactive mode Viewpoint gaze Action recognition Figure 11: Action recognition interaction time comparison table (unit: second). According to the data in Figure 11, the average results of 100 experiments show that the action recognition-based VR interaction time is generally shorter than the point-of-view gaze system. This suggests that action recognition VR interaction has advantages. The point-of-view gaze system requires an arrangement of fixed buttons in the VR space, which is not user-friendly and affects the immersion experience. In contrast, body-motion recognised interaction avoids these drawbacks, making the interaction smoother and more efficient. Therefore, body-motion-based VR interaction systems provide a more reliable alternative to VR glasses. The system developed through PC/Windows/VC5.9 software environment was used to simulate the jumping technical movements of divers and the serving technical movements of male volleyball players. The system uses Euler angles in the y-direction to represent the two movements in order to compare the similarity between the simulation modelling effects and the Euler angles to verify the effectiveness of the system. The display of the technical movements and the Euler angles are represented in Figure 12. -0.5 0.5 1.5 2.5 y Euler Angle 0.2 0.1 0.3 0.4 0.6 0.5 (b) Action - Euler Angle representation (a) Action 1 effect is presented -0.2 0.2 0.6 1.0 y Euler Angle 0.3 0.1 0.5 0.7 1.1 0.9 (d) Action II Euler Angle representation (c) The effect of action 2 is presented Figure 12: Technical action effect presentation and Euler Angle representation. Figure 12 shows that the simulation of the two technical movements by the described system is highly consistent with the representation of the Euler angles in the y-direction. This demonstrates the accuracy of the system in terms of simulation modelling and shows that its simulation results are applicable to real sports training. The system also facilitates the adjustment and modification of the movements and allows the simulation results to be used for subsequent comparative analysis with the training videos. Finally, in order to further confirm the validity and application value of the methodology proposed in the study, the study was conducted through questionnaires, interviews, usage tracking, and skill enhancement assessment, and we collected feedback from 20 dancers and 5 coaches. The specific results are shown in Table 3. Virtual Simulation of Dance by Integrating VR Technology and… Informatica 48 (2024) 49–64 61 Table 3: User experience feedback. Classification Recommendation score Accuracy score Interactivity score Ease of use score Recommendation Dancers 95 96 84 90 Add more dance moves and tutorials to enrich the teaching content Coaches 97 92 80 84 Provides personalised settings to suit different levels of dancers The average results of the feedback from 20 dancers and 5 instructors on the use of the study's proposed system are shown in Table 3, with each rating item scored out of 100. It can be seen that the VR dance simulation system proposed by the study has significant potential to improve dance learning efficiency and engagement. It is expected to become an important tool for dance education and training through further technical optimisation and content expansion. 5 Discussion The virtual dance simulation method proposed in the study achieves 81% accuracy on the self-built dataset, and 80% and 55% validation accuracy on the NTU RGB+D dataset and Kinetics dataset, respectively. Compared to existing methods in the literature, such as those of Zhao et al [6] and Wei et al [7], the proposed method of the study achieves an improvement in accuracy of 14.89% and 7.99%, respectively. This significant improvement may be due to the fact that the diversity and scale of the dataset play a better effect on model training. The multi-view and large-scale nature of the Human3.6M dataset provides rich learning samples for the model, which helps to improve the generalisation ability of the model. And compared with the traditional 3D convolutional neural network, the graph convolutional neural network better captures the spatio-temporal features of the human skeleton data, especially when dealing with non-regular mesh data. In addition, the combination of motion capture technology and VR technology provides more detailed and accurate motion data, which provides strong support for the accuracy of motion recognition. It is worth mentioning that the study innovatively combines graph convolutional modelling with VR technology for virtual simulation of dance movements. This approach not only improves the accuracy of movement recognition, but also provides a more natural and intuitive interaction in a virtual reality environment. In addition, through real- time feedback and interactivity, studying the proposed system provides an innovative learning tool for dance learners and performers. Although the proposed system performed well in the experiments, there are still some errors and limitations. In the experiments, it was found that the accuracy of recognising small amplitude and vertically extended movements was low. This may be due to the accuracy limitation of the motion capture system in capturing subtle movements. In addition, although the system was able to provide real-time feedback in most cases, there is still room for improvement in the system response time for the recognition of certain complex movements. In terms of user adaptability, there exists the possibility that some users may need time to adapt to the interaction in the VR environment, especially for dancers who are not accustomed to using high-tech devices. In summary, the proposed method of the study provides a novel and effective approach in the field of virtual dance simulation. By combining VR technology and motion capture technology, it not only improves the accuracy of motion recognition, but also enhances the user's immersion and interaction experience. In the future, the algorithm will be further optimised to improve the recognition of small movements; and the system response time will be improved to ensure a smoother interaction experience. More affordable equipment and software solutions will be developed to lower the threshold of use. In addition, the research will explore more application scenarios, such as dance education and rehabilitation training, in order to give full play to the potential of virtual simulation technology in the field of dance. 6 Conclusion VR technology and motion capture technology are areas that have made huge breakthroughs in recent years, and they have already revolutionised a number of fields, including entertainment, healthcare, education and training. One such area that has received a great deal of attention is virtual simulation of dance. This research fuses virtual reality technology with motion capture technology by first 62 Informatica 48 (2024) 49–64 R. Tao using a motion capture system to accurately record the dancer's movement data, which is then fed into a VR system. Within the VR environment, this movement data was used to create an interactive, three-dimensional dance virtual character that allowed the user to experience the dance from a first-person perspective. The experimental results show that when operating on four common body movements and their corresponding commands, the accuracy of all movements tested exceeded 70%, with a high of 92% and a low of 76%, confirming the feasibility of body movements in VR interaction. In the experiments on the display and Euler angle representation of technical movements, the system's simulation of the two technical movements and the representation of the y-direction Euler angle are highly consistent. This proves the accuracy of the system in simulation modelling and shows that its simulation results are applicable to real sports training. The method improves the efficiency and engagement of dance learning through highly realistic visual and motor feedback, while providing researchers with a new way to analyse dance movements and improve teaching methods. However, the method still suffers from high cost, user adaptability problems, and limitations of dance naturalness, and more affordable equipment and software solutions can be further developed in the future. References [1] Li Q Y , Li Z H, & Han J (2021). A hybrid learning pedagogy for surmounting the challenges of the COVID-19 pandemic in the performing arts education. Education & Information Technologies, pp. 7635-7655. https://doi.org/10.1007/s10639-021-10612-1. [2] Hsia L H, Hwang G J, & Lin C J (2022). A WSQ- based flipped learning approach to improving students’ dance performance through reflection and effort promotion. Interact Learn Envir, pp. 229-244. Interactive Learning Environments,1-16. https://doi.org/10.1080/10494820.2019.1651744. [3] Gregor S, Vaughan-Graham J, Wallace A, Walsh H, & Patterson K K (2021). Structuring community- based adapted dance programs for person’s post- stroke: A qualitative study. Disabil Rehabil, pp. 2621-2631. https://doi.org/10.1080/09638288.2019.1708978. [4] Gao P J, Zhao D, & Chen X A N (2020). Multi- dimensional data modelling of video image action recognition and motion capture in deep learning framework. IET Image Process, pp. 1257-1264. https://doi.org/10.1049/iet-ipr.2019.0588. [5] Teer B (2021). Performance analysis of sports training based on random forest algorithm and infrared motion capture. Journal of Intelligent & Fuzzy Systems, pp. 6853-6863. http://dx.doi.org/10.3233/JIFS-189517. [6] Pfeiffer J, Pfeiffer T, Meissner M, & Weiss E (2020). Eye-tracking-based classification of information search behavior using machine learning: Evidence from experiments in physical shops and virtual reality shopping environments. Information Systems Research, pp. 675-691. http://dx.doi.org/10.1287/isre.2019.0907. [7] Wei M J, Zhang H X, & Fang T Y (2020). Enhancing the course teaching of power system analysis with virtual simulation platform. International Journal of Electrical Engineering & Education. https://doi.org/10.1177/0020720920953434. [8] Li B L, Peng L Y , Gong C B, Chen J R, Zou H L, Luo H Q, & Li N B (2022). Virtual simulation guiding high-risk undergraduate experiments about chemical synthesis of MoS 2 monolayers via a schlenk line. Journal of Chemical Education, pp. 3124-3132. https://doi.org/10.1021/acs.jchemed.2c00252. [9] Hu M Y , Ji J P, Duan J D, & Wang Q (2021). Distributed wind power virtual simulation experiment system for cultivating the ability to solve complex engineering problems. Computer Applications in Engineering Education, pp. 1441-1452. https://doi.org/10.1002/cae.22396. [10] Warland A, Paraskevopoulos I, Tsekleves E, Ryan J, Nowicky A, Griscti J, Levings H, & Kilbride C (2019). The feasibility, acceptability and preliminary efficacy of a low-cost, virtual-reality based, upper- limb stroke rehabilitation device: A mixed methods study. Disabil Rehabil, pp. 2119-2134. https://doi.org/10.1080/09638288.2018.1459881. [11] Maciejewski M, Piszczek M, Pomianek M, & Palka N (2020). Design and evaluation of a steamvr tracker for training applications-simulations and measurements. Metrology & Measurement Systems, pp. 601-614. http://dx.doi.org/10.24425/mms.2020.134841. [12] Yang D, Kim D, & Lee S H (2021). LoBSTr: Real- time lower-body pose prediction from sparse upper- body tracking signals. Computer Graphics Forum, pp. 265-275. https://doi.org/10.48550/arXiv.2103.01500. [13] Qiu S, Zhao H K, Jiang N, Wu D H, Song G C, Zhao H Y , & Wang Z L (2022). Sensor network oriented human motion capture via wearable intelligent system. International Journal of Intelligent Systems, pp. 1646-1673. http://dx.doi.org/10.1002/int.22689. [14] Kumar M S, & Mohan S (2023). Selective fruit harvesting: Research, trends and developments towards fruit detection and localization-A review. The Journal of Mechanical Engineering Science, pp. 1405-1444. https://doi.org/10.1177/09544062221128443. [15] Lin J C, Li S, Qin H, Wang H C, Cui N, Jiang Q, Jian H F, Wang G M (2023). Overview of 3d human pose estimation. CMES-Computer Modeling in Engineering & Sciences, pp. 1621-1651. https://doi.org/10.32604/cmes.2022.020857. Virtual Simulation of Dance by Integrating VR Technology and… Informatica 48 (2024) 49–64 63 [16] Binsch O, Oudejans N, van der Kuil M N A, Landman A, Smeets M M J, Leers M P G, & Smit A S (2022). The effect of virtual reality simulation on police officers' performance and recovery from a real-life surveillance task. Multimed Tools & Applications, pp. 17471-17492. https://doi.org/10.1007/s11042-022-14110-5. [17] Lovanshi M, & Tiwari V (2024). Human skeleton pose and spatio-temporal feature-based activity recognition using ST-GCN. Multimed Tools & Applications, pp. 12705-12730. https://doi.org/10.1007/s11042-023-16001-9. [18] Brock H, Law F, Nakadai K, Nagashima Y (2020). Learning three-dimensional skeleton data from sign language video. ACM Transactions on Intelligent Systems & Technology, pp. 30. https://doi.org/10.1145/3377552. [19] Walters R K, Gale E M, Barnoud J, Glowacki D R, & Mulholland A J (2022). The emerging potential of interactive virtual reality in drug discovery. Expert Opinion on Drug Discovery, pp. 685-698. https://doi.org/10.1080/17460441.2022.2079632. [20] Zheng C, Wu W H, Chen C, Yang T J N, Zhu S J, Shen J, Kehtarnavaz N, & Shah M (2024). Deep learning-based human pose estimation: A survey. ACM Computing Surveys, pp. 11. https://doi.org/10.1145/3603618. 64 Informatica 48 (2024) 49–64 R. Tao