https://doi.org/10.31449/inf.v48i5.5406 Informatica 48 (2024) 15 –22 15 A Study on the Recognition of Typical Movement Characteristics of Ethic Folk Dances Based on Movement Data Ying Wang School of Dance, Northwest Normal University, Lanzhou, Gansu 730070, China Corresponding address: No. 967, Anning East Road, Anning District, Lanzhou City, Gansu 730070, China Email: wc23470@163.com Keywords: folk dance, typical movement, motion data, feature recognition, skeleton joint Received: November 8, 2023 Ethnic folk dances possess significant cultural value and require documentation and preservation. This article begins by recognizing the distinctive movement characteristics found in ethnic folk dances. It then collects skeletal motion data of the human body while executing typical movements in ethnic folk dances using Kinect V2. Two primary features, namely angle and relative distance, were extracted. Deep learning was combined with the attention mechanism to design a three-layer BiLSTM-attention method. Experiments were conducted using the typical movement feature set of ethnic folk dance and the MSR- Action3D dataset. It was found that the three-layer BiLSTM method exhibited superior performance when compared to other configurations of BiLSTM layer. Additionally, the results derived from the BiLSTM model surpassed those achieved with RNN or LSTM models. Furthermore, the inclusion of the attention layer led to a noteworthy 0.0234 increase in the ACC value compared to models without it. The processed features demonstrated enhanced performance compared to the raw skeletal motion data. ACC values exceeding 0.95 were achieved for the recognition of typical movement features in various types of ethic folk dances. Notably, the ACC value of the three-layer BiLSTM method for the MSR-Action3D dataset was 0.9767, which was superior to the other methods. These outcomes validate the robustness of the methodology presented in this paper for recognizing typical movement features in folk dance and suggest its potential for practical applications. Povzetek: Raziskava predstavlja analizo značilnih gibalnih vzorcev ljudskih plesov s pomočjo tri nivojske arhitekture z globokim učenjem. 1 Introduction Dance, as a performing art, has evolved significantly in response to societal developments. Its styles have become increasingly diverse, encompassing traditional folk dance, modern dance, jazz dance, and more. As a result, it has gained popularity among a growing number of enthusiasts [1]. Ethnic folk dance, in particular, emerges from the unique cultural and regional influences of each nation, encapsulating rich national culture and spirit. Consequently, the study of ethnic folk dance plays a pivotal role in preserving and spreading cultural heritage. Typically, each ethnic folk dance has its own unique movements and gestures. By recognizing these characteristic movement features, it is possible to classify and document various ethnic folk dances. However, recognizing the distinctive features of ethnic folk dances poses great difficulty due to the complex variations in their movements. Fortunately, with the continuous advancement of emerging technologies like sensors and computers, an increasing number of methods have been applied to the field of movement recognition [2]. 2 Related works According to Table 1, currently in the field of action recognition, most research focuses on human daily activities, with some studies also exploring gesture movements. However, there is relatively little research on dance movements. Furthermore, when it comes to discussing the application of deep learning methods in movement recognition, there is still potential for further improving recognition effectiveness. Additionally, existing features mostly come from videos and sensors, with limited consideration given to skeletal motion data. Therefore, this article focuses on the recognition of typical movement characteristics in ethnic folk dance. Based on Kinect device to acquire skeletal motion data, a method using long short-term memory (LSTM) neural network was designed. Through experimental analysis, the effectiveness of this method has been proven, providing a new approach for recording and inheriting ethnic folk dance and offering theoretical support for further application of Kinect device in movement recognition. Table 1: A summary table of related works Literat ure Feature Recognition method Results Gao et al. [3] Integration of deep video and RGB trichromatic features A global coding algorithm and a convolutional neural An average classification accuracy rate of 85.79% 16 Informatica 48 (2024) 15 –22 Y. Wang network Barko ky et al. [4] Complex network- based features extracted from RGB- D data Meta-paths in complex networks Good recognition performance on both MSR- Action Pairs and MSR Daily Activity3D Li et al. [5] M-mode ultrasound A support vector machines (SVM) and a back- propagation neural network (BPNN) The average classification accuracy was 98.83% ± 1.03% for SVM and 98.70% ± 0.99% for BPNN. Athav ale et al. [6] Daily human activity signals recorded by cell phone acceleromet ers VGG16-SVM An accuracy of 79.55% and an F-value of 71.63% 3 Skeleton motion data In the realm of movement recognition, commonly utilized motion data include RGB data, and optical flow data [7]. Skeleton data comprises the 3D coordinates of skeletal joints and offers several advantages, such as smaller dimensions, ease of acquisition, and good stability. This type of data can be directly obtained through depth sensors like Kinect [8], which circumvent issues associated with traditional camera-based motion data collection, such as susceptibility to light and background interference. Consequently, skeleton data has gained increasing prominence in the field of movement recognition. This paper predominantly utilizes the Kinect sensor as the device for acquiring the motion data to facilitate the capture of skeletal motion data in the context of folk dance. The performance of Kinect V2 has been improved in all aspects compared to Kinect V1. The comparison of the two devices is shown in Table 2. It can be found that the detection range of Kinect V2 is larger and the number of joints detected is more. As a result, it has better performance in extracting skeleton motion data. This study used the Kinect V2 as the acquisition device. Table 2: Comparison of two generations of Kinect devices Equipment parameters Kinect V1 Kinect V2 Number of people 6 6 Number of detected joints 20 joints/person 25 joints/person Effective detection range 0.8-4.0 m 0.5-4.5 m Horizontal angle detection range 57° 70° Vertical angle detection range 43° 60° USB interface 2.0 3.0 APP Single Multiple Kinect achieves data acquisition and processing through the Kinect SDK. The Kinect SDK comprising various tools and software libraries, serving as the foundation for human-computer interaction development. The process of extracting skeletal motion data is as follows. The Kinect SDK analyzes a single depth image, classifies different body parts using a random decision forest to determine whether each pixel belongs to a skeletal joint, and then updates the three-dimensional coordinates in real time. This algorithm operates at a speed of approximately 5 milliseconds per frame, providing robust real-time performance and effectively meeting practical requirements. The 25 joints extracted by the Kinect V2 are shown in Figure 1. 2 3 20 1 0 12 16 17 13 14 18 15 19 4 8 5 9 6 10 22 7 21 11 24 23 Figure 1: Kinect V2 skeletal joint nodes. The 25 joint nodes shown in Figure 1 contain a detailed division of hand and foot joint nodes, but in the actual application process, too many joint nodes may instead lead to an increase in the amount of computation and noise; A Study on the Recognition of Typical Movement Characteristics … Informatica 48 (2024) 15 –22 17 therefore, considering the actual movement of the human body in the process of accomplishing the folk dance, this paper simplified the skeletal joint nodes. Finally, the 15 joint nodes shown in Table 3 were used for computation. Table 3: Simplified 15 skeletal joint nodes Serial number Name 0 Spine base 3 Head 4 Shoulder left 5 Elbow left 6 Wrist lest 8 Shoulder right 9 Elbow right 10 Wrist right 12 Hip left 13 Knee left 14 Ankle left 16 Hip right 17 Knee right 18 Ankle right 20 Spine shoulder The Kinect coordinate system is written as (𝑥 𝑘 ,𝑦 𝑘 ,𝑧 𝑘 ), and the human body coordinate system is written as (𝑥 ℎ ,𝑦 ℎ ,𝑧 ℎ ). It is assumed that the mapping of the angle between the shoulder line and 𝑋 𝑘 on the 𝑋 𝑘 𝑂 𝑍 𝑘 plane is 𝛼 , then the skeletal joint node is converted from the Kinect coordinate system to the human body coordinate system. The equation is: [ 𝑥 ℎ 𝑦 ℎ 𝑧 ℎ ] = [ cos𝛼 0 −sin𝛼 0 1 0 sin𝛼 0 cos𝛼 ] [ 𝑥 𝑘 𝑦 𝑘 𝑧 𝑘 ]. The processed skeletal joint nodes' coordinates have certain limitations if directly used as the input for subsequent movement feature recognition methods, which would require a significant amount of computation and increase computational complexity. Therefore, the coordinate data need to be processed again to extract more representative features. In feature extraction, it is important to consider that the features should not vary significantly due to differences in human skeletal structure, while still effectively reflecting changes in human movement. During the execution of ethnic folk dances, various joints of the body exhibit different angles, and these angles also differ for different dance movements. Additionally, there are certain patterns in the variations of relative distances. This paper extracted the following two types of features in terms of angle and distance considerations. (1) Angles. During the movement process, the skeletal joint nodes will have different sizes of angles. As shown in Figure 2, taking the angle formed by joint nodes 4, 5, and 6 as an example, it consists of two vectors, i.e., the vector formed by joint nodes 5 and 4, and the vector formed by joint nodes 5 and 6. The former is defined as r i , and the latter is defined as r j . The formula for 𝜃 can be written as: 𝜃 𝑛 = 𝑎𝑟𝑐𝑐𝑜𝑠 𝑟 𝑖 1 ×𝑟 𝑗 1 +𝑟 𝑖 2 ×𝑟 𝑗 2 +𝑟 𝑖 3 ×𝑟 𝑗 3 √𝑟 𝑖 1 2 +𝑟 𝑖 2 2 +𝑟 𝑖 3 2 +√𝑟 𝑗 1 2 +𝑟 𝑗 2 2 +𝑟 𝑗 3 2 . By this method, the 15 angles in Figure 2 can be calculated. Figure 2: Schematic diagram of joint angles. (2) Relative distance. During the movement process, the spatial position of the skeletal joint nodes will also change. For example, when performing various different folk dances, the hands, feet, and human spine have different relative distances; therefore, this paper considers joint node 0, i.e., the spine base, as the center point. The calculation formula is: 𝐷 𝑖𝑗 = √ (𝑥 𝑖 − 𝑥 𝑗 ) 2 + (𝑦 𝑖 − 𝑦 𝑗 ) 2 + (𝑧 𝑖 − 𝑧 𝑗 ) 2 . According to the above equation, the relative distance between the remaining 15 joints and the center point can be calculated. To avoid the influence of individual differences on the results, the calculated relative distances were normalized: 𝐷 𝑖𝑗 ′ = 𝐷 𝑖 𝑗 𝑑 where 𝑑 is the distance between joint node 20 and joint node 0 in Figure 2. 18 Informatica 48 (2024) 15 –22 Y. Wang 4 Motion feature recognition method The recurrent neural network (RNN) has proven to be highly effective in addressing temporal problems [9]. However, RNNs suffer from the problems of vanishing and exploding gradients [10], as well as limitations in handling long input sequences and preserving long-term dependencies. In contrast, LSTM is a type of RNN that can learn long-term dependencies [11]. They not only alleviate the issues of vanishing and exploding gradients but also effectively filter temporal information through gate structures, allowing for flexible retention of important information. Given that skeletal movement data inherently contains abundant temporal information, this paper leveraged the LSTM to design a movement feature recognition method. LSTM realizes the processing of information through three gates, and its application to the recognition of movement features of folk dance can obtain better results. It is assumed that the input of the current moment is 𝑥 𝑡 , the output of the previous moment is ℎ 𝑡 −1 , and LSTM measures the degree of updating the input information through input gate i t : 𝑖 𝑡 = 𝜎 (𝑈 𝑖 ∙ [ℎ 𝑡 −1 ,𝑥 𝑡 ] + 𝑏 𝑖 ). Forget gate 𝑓 𝑡 is used to control the forgetting degree of historical information, which can be written as: 𝑓 𝑡 = 𝜎 (𝑈 𝑓 ∙ [ℎ 𝑡 −1 ,𝑥 𝑡 ] + 𝑏 𝑓 ). At time t, the update formula of a memory unit can be written as: 𝑐 𝑡 = 𝑓 𝑡 × 𝑐 𝑡 −1 + 𝑖 𝑡 × tanh(𝑈 𝑐 ∙ [ℎ 𝑡 −1 ,𝑥 𝑡 ] + 𝑏 𝑐 ). Output gate o t is used to control the output quantity of output value h t in the LSTM, which can be written as: 𝑜 𝑡 = 𝜎 (𝑈 𝑜 ∙ [ℎ 𝑡 −1 ,𝑥 𝑡 ] + 𝑏 𝑜 ). Finally, output gate o t and unit state c t jointly determine the output of LST. It is expressed by the following equation: ℎ 𝑡 = 𝑜 𝑡 tanh(𝑐 𝑡 ), where 𝑈 and 𝑏 denote the weight and bias term of each layer. LSTM has a limitation in that it cannot encode information from the backward to forward. The human skeletal motion data is very complex. Unidirectional LSTM can only process the motion data in one direction, while BiLSTM can simultaneously capture all information from both the forward and backward directions. In order to extract more features from the data, this paper adopted BiLSTM, which learns features through a forward LSTM and a backward LSTM. It is expressed as: ℎ 𝑡 ⃗⃗⃗ = 𝐿𝑆𝑇𝑀 ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ (ℎ 𝑡 −1 ,𝑥 𝑡 ,𝑐 𝑡 −1 ), ℎ 𝑡 ⃖⃗⃗⃗ = 𝐿𝑆𝑇𝑀 ⃖⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ (ℎ 𝑡 +1 ,𝑥 𝑡 ,𝑐 𝑡 +1 ). Then, the final hidden layer output of BiLSTM is: ℎ 𝑡 = ℎ ⃗ 𝑡 + ℎ ⃖⃗ 𝑡 . In addition, in order to avoid the inadequacy of one- layer BiLSTM in data feature extraction, this paper proposed a multi-layer BiLSTM and combined it with the attention mechanism [12], so as to further improve the performance of movement feature recognition. The established three-layer BiLSTM-attention structure is shown in Figure 3. Angle feature Relative distance feature BiLSTM BiLSTM BiLSTM Attention layer Softmax Figure 3: Three-layer BiLSTM-attention structure. As in Figure 3, the features are inputted into the BiLSTM for learning, and then the output of the previous BiLSTM layer is used as the input of the next BiLSTM layer to further mine the deeper associations between features and strengthen the feature learning ability. Then, output ℎ 𝑡 of the three-layer BiLSTM is sent to the attention layer, and weights are assigned according to the importance of features. The corresponding equations are: ℎ 𝑡 ′ = ∑ 𝑎 𝑡 𝑡 =1 ℎ 𝑡 , 𝑎 𝑡 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 (𝑠𝑐𝑜𝑟𝑒 𝑡 ), 𝑠𝑐𝑜𝑟𝑒 𝑡 = 𝑓 (𝑄𝑢𝑒𝑟𝑦 ,ℎ 𝑡 ), where 𝑎 𝑡 is the weight corresponding to each feature vector, which is calculated by 𝑠𝑐𝑜𝑟𝑒 𝑡 , and 𝑓 is a scoring function. The relational degree between ℎ 𝑡 and current object 𝑄𝑢𝑒𝑟𝑦 is calculated to realize the assignment of attention. Then, ℎ 𝑡 ′ output from the attention layer undergoes softmax classification to obtain the final results for ethic folk dance movement feature recognition: 𝑦̂ = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 (ℎ 𝑡 ′). Using the cross entropy as a loss function, the model is trained with the goal of minimizing the cross entropy: 𝐿 = −∑ 𝑦 𝑖 log𝑦̂ 𝑖 𝐶 𝑖 =1 , where 𝐶 is the movement category. 5 Results and analysis 5.1 Experimental setup The experiments were carried out on a Windows 10 computer, and the algorithm was implemented using the Keras framework. In the three-layer BiLSTM-attention model, the hidden layer comprised 512 neurons. Dropout was used and set as 0.3 to mitigate overfitting. The input nodes were set as 30 dimensions, i.e., there were 30 input nodes. The learning rate was set at 0.001, and the total number of iterations was fixed at 1,000. A Study on the Recognition of Typical Movement Characteristics … Informatica 48 (2024) 15 –22 19 The experimental data comprised two distinct parts. The first part consisted of a typical movement feature set derived from ethnic folk dances, which were collected in the laboratory from 50 dance majors. These dances included five different styles: Uighur, Mongolian, Han, Tibetan, and Dai dances. For each dance style, a representative movement was selected, as depicted in Figure 4. The description of different types of ethnic folk dances is as follows. (1) Uighur dance: The basic characteristics of Uyghur dance in terms of posture are upright head, chest out, and straight waist. It achieves graceful movements through continuous slight tremors or variations in knee movements, as well as decorative actions such as neck movement and wrist flipping. (2) Mongolian dance: Mongolian dance is robust, graceful, and bold, characterized by continuous undulations and a combination of softness and strength. It relies on the coordination of various parts of the body to meet rhythmic requirements. (3) Han dance: The movements of Han ethnic dance are graceful and fluid, with unique hand gestures that showcase the dancers' elegance and strength through body rotations, twists, and bends. The footwork is light and agile, emphasizing precision and rhythm. (4) Tibetan dance: Tibetan dances often involve standing or half-squatting postures, with graceful movements such as jumps and spins. The footwork is agile and varied, allowing the dancers to showcase their flexibility through bending the waist, arching the back, and loosening the hips. (5) Dai dance: Dai ethnic dance showcases the flexibility of the lower legs, characterized by graceful knee movements. It forms a basic rhythm through the bending of arms and various joints in the body, resulting in elegant and intricate hand and foot gestures. Each student refrained from strenuous exercise for 24 hours prior to data collection. The students were then guided by experimental staff to perform the specified movements within the effective range of the Kinect device. Each movement was executed ten times, resulting in the collection of 500 samples per movement and a total of 2,500 samples. If there was any missing action sequence frame, the skeletal data from the previous frame was used for filling. Furthermore, in order to reduce information redundancy in the original data, only one frame was retained for every five frames. To maintain uniformity in the number of frames across samples, the last frame was duplicated to match samples with fewer frames than the maximum frame count. The second part of the experimental data was a widely used dataset in the field of movement recognition known as MSR-Action3D [13]. This dataset included 20 distinct movements performed by ten individuals. As each movement was executed 2-3 times, there were 557 samples of movement. To ensure reliable results, both datasets underwent experimentation utilizing a five-fold cross-validation. Figure 4: Typical movement characteristics of ethnic folk dances (from left to right: Uighur dance: double hat support; Mongolian dance: soft arms; Han dance: push the fan horizontally with both hands diagonally downward; Tibetan dance: peacock reflecting its image by the water's edge; Dai dance: small jumps by stomping and bending legs). The performance of the algorithm was assessed using the accuracy of movement feature recognition, i.e., the ratio of correctly recognized samples to the total samples, which is expressed as: 𝐴𝐶𝐶 = 𝑇𝑃 𝑇𝑃 +𝐹𝑁 , where TP stands for the number of samples that are actually positive and also identified as positive, and 𝐹𝑁 stands for the number of samples that are actually positive but are recognized as negative. 5.2 Analysis of results Firstly, the performance of the three-layer BiLSTM- attention method was analyzed using the feature set of folk dance movements to compare the effect of different BiLSTM layers on the performance. The results are presented in Figure 5. Figure 5: Effect of number of BiLSTM layers on recognition performance. From Figure 5, it can be found that the performance of the algorithm was extremely poor when only one layer of BiLSTM was used, and the ACC value for recognizing folk dance movement features was 0.9126, which indicated that one layer of BiLSTM did not adequately learn the features. Then, as the number of BiLSTM layers increased, the ACC value of the algorithm also increased. It reached a maximum of 0.9667 when there were three BiLSTM layers, which was 0.0541 higher compared with that of one layer. In the case where the number of BiLSTM layers continued to increase, the complexity of the model 20 Informatica 48 (2024) 15 –22 Y. Wang also increased, which brought a burden to the recognition of ethnic folk dance movement features, resulting in a decrease in the ACC value. Specifically, when there were five BiLSTM layers, the ACC value was 0.9416, which was 0.0251 lower compared to three layers. The results in Figure 4 showed that the best results could be obtained when using a three-layer BiLSTM. Then, the effect of replacing BiLSTM with RNN, LSTM, and removing the attention layer on the performance was compared, and the results are displayed in Table 4. Table 4: Effect of BiLSTM layer and attention layer on performance. ACC value Three-layer RNN- attention 0.8962 Three-layer LSTM- attention 0.9189 Three-layer BiLSTM 0.9433 Three-layer BiLSTM- attention 0.9667 From Table 4, when replacing BiLSTM with RNN, the ACC value of the algorithm became 0.8962, which was reduced by 0.0705 compared to the three-layer BiLSTM-attention method. When replacing BiLSTM with LSTM, the ACC value of the algorithm became 0.9189, which was reduced by 0.0478. This suggested that BiLSTM outperformed LSTM in the learning of folk dance movement features. The comparison of the results obtained in the presence and absence of the attention layer demonstrated that the ACC value of the three-layer BiLSTM method without adding the attention layer was 0.9433, which was reduced by 0.0234 compared with adding the attention layer. The result revealed the importance of the attention layer. In the feature processing, this paper chose the angle and relative distance features extracted from the preprocessed skeleton motion data. The effects of different choices of features on the recognition results were compared, and the results are presented in Table 5. Table 5: Effect of feature selection on performance. ACC value Original data 0.8527 Angle features 0.9216 Relative distance features 0.9423 Angle features + relative distance features 0.9667 From Table 5, the ACC value obtained by inputting the original data into the three-layer BiLSTM-attention method for recognition was 0.8527. Under this condition, the training data was too large, which made training difficult. Additionally, the features extracted by the algorithm from the original skeleton movement data were insufficient to effectively distinguish between different categories of ethnic folk dances. As a result, the ACC value was also low. When using angle features or relative distance features individually, the ACC values were 0.9216 and 0.9423, respectively, which were improved by 0.0689 and 0.0896 compared to using the original features. This suggested that the relative distance features contributed more to the recognition of ethnic folk dance movement features and contained more information. Finally, the ACC value of the algorithm in the case of using angle features + relative distance feature was 0.9667, which was improved by 0.114 compared to the case of using the original features. This result proved the reliability for the processing of skeleton movement data. The recognition performance of the three-layer BiLSTM-attention method for different categories of ethnic folk dances in the five-fold cross-validation is shown in Table 6. Table 6: Recognition performance for different categories of folk dances. Uighu r dance Mongoli an dance Han danc e Tibeta n dance Dai danc e 1 0.973 2 0.9824 0.966 1 0.955 2 0.957 7 2 0.982 1 0.9644 0.963 4 0.950 2 0.966 2 3 0.988 2 0.9712 0.951 6 0.953 1 0.965 6 4 0.988 4 0.9778 0.971 7 0.950 7 0.970 3 5 0.984 1 0.9712 0.953 2 0.951 3 0.958 2 Avera ge 0.983 2 0.9734 0.961 2 0.952 1 0.963 6 From Table 6, it can be found that the results of the algorithm obtained in the five experiments were relatively stable, and the differences in ACC values were small. Then, from the comparison of different categories, the algorithm exhibited the best performance in recognizing Uighur dance, reaching an ACC value of 0.9832, followed by Mongolian dance with an ACC value of 0.9734, and it achieved the lowest ACC value (0.9521) when recognizing Tibetan dance. These results demonstrated that the three-layer BiLSTM-attention method achieved ACC values above 0.95 for recognizing various categories of ethnic folk dance movement features. Finally, the method proposed in this paper was compared with other methods in the literature using the MSR-Action3D dataset: (1) differential RNN: an LSTM incorporating differential gating scheme [14]; (2) linear SVM [15]; (3) spatio-temporal LSTM (ST-LSTM) [16]. The comparison of the results is presented in Figure 6. A Study on the Recognition of Typical Movement Characteristics … Informatica 48 (2024) 15 –22 21 Figure 6: Comparison with other methods. The ACC values of these methods for the MSR- Action3D dataset were above 0.9 in Figure 6. In comparison, the differential RNN method had the lowest ACC value (0.9203), while the three-layer BiLSTM- attention method obtained the highest ACC value, which was 0.0564 higher than the differential RNN method, 0.0313 higher than the linear SVM method, and 0.0092 higher than the ST-LSTM method. These results verified the recognition performance of the three-layer BiLSTM- attention method. 6 Discussion Movement recognition is an important research direction in computer vision, aiding computers in identifying human or object movements from data. It has wide applications in fields such as video surveillance, human-computer interaction, and virtual reality. With the advancement of technology, deep learning methods have made significant progress in movement recognition, with models like RNN being widely used for feature extraction and identification of movements. The performance of movement recognition methods depends on the diversity of training data, while data collected based on images or videos is likely to be affected by changes in shooting angles and positions, which can impact the performance of movement recognition. Additionally, the collaborative motion of multiple parts also increases the complexity of movement recognition. Currently, most movement recognition methods are based on daily human activities such as walking and gestures. However, ethnic folk dances involve complex and diverse motion variations, making it difficult for traditional movement recognition approaches to achieve satisfactory performance. Therefore, this study combined skeleton motion data with deep learning methods to investigate the approach for recognizing the motion characteristics of ethnic folk dances. The original skeletal data is too complex and contains a lot of redundant information, which hinders the performance and efficiency of movement feature recognition. Based on the characteristics of ethnic folk dance movements, this paper proposed using joint angles and relative distances as features. To improve the recognition performance of LSTM, this paper designed a three-layer BiLSTM-attention method and conducted experiments on two datasets. Firstly, from the recognition results of ethnic folk dances, both the feature selection method and action recognition method designed in this paper demonstrate excellent performance. They are capable of capturing valuable features for action recognition more effectively from skeletal motion data. The utilization of BiLSTM structure and the addition of an attention layer contribute to improving the accuracy of dance movement recognition. The recognition accuracy for different types of ethnic folk dances was consistently above 0.95. Then, the method designed in this paper was compared with other current recognition methods on the MSR-Action3D dataset, further demonstrating the effectiveness of this approach in movement recognition. The research in this article provides a novel and reliable method for recognizing the movement characteristics of ethnic folk dances, which contributes to the preservation and inheritance of ethnic folk dance culture. It is of great significance for documenting and protecting some endangered ethnic folk dances that are on the verge of being lost. Additionally, this research further enriches the field of movement recognition, promoting the development of deep learning and movement recognition. 7 Conclusion This paper primarily focuses on the movement feature recognition method for folk dance based on skeleton movement data. A three-layer BiLSTM-attention method was developed. Through experimental analysis, it was observed that the algorithm achieved its optimal performance in recognizing movement features of folk dance when employing three BiLSTM layers. Removing the attention layer resulted in a decrease of 0.0234 in the ACC value. The algorithm demonstrated the highest ACC value of 0.9667 when utilizing both the angle and relative distance features. Furthermore, the ACC values for recognizing various types of folk dance movement features consistently exceeded 0.95. Finally, the proposed method outperformed other techniques when the MSR- Action3D dataset was used, further confirming its reliability for movement recognition. The three-layer BiLSTM-attention method holds promise for broader application and implementation in practical contexts. However, there are still some limitations in this study. For instance, the experimental dataset was limited and did not cover a wider range of ethnic folk dance movements. Therefore, in future research, the author will expand the scope and quantity of data collection to further validate the proposed method's reliability in recognizing characteristics of ethnic folk dance movements. Additionally, the author will explore the possibility of applying this method to other fields such as sports movement recognition and investigate methods to improve recognition performance by comparing and analyzing the effects of various deep learning approaches on movement recognition. References [1] Schupp K (2018). Dance Competition Culture and Commercial Dance: Intertwined Aesthetics, Values, and Practices. Journal of Dance Education, 19, pp. 1-10. https://doi.org/10.1080/15290824.2018.1437622. 22 Informatica 48 (2024) 15 –22 Y. Wang [2] Li L (2021). Mirror motion recognition method about upper limb rehabilitation robot based on sEMG. Journal of Computational Methods in Sciences and Engineering, 21, pp. 1021-1029. https://doi.org/10.3233/JCM-204812. [3] Gao P, Zhao D, Chen X (2020). Multi-dimensional data modelling of video image action recognition and motion capture in deep learning framework. IET Image Processing, 14, pp. 1257-1264. https://doi.org/10.1049/iet-ipr.2019.0588. [4] Barkoky A, Charkari N M (2022). Complex Network-based features extraction in RGB-D human action recognition. Journal of Visual Communication & Image Representation, 82, pp. 1- 9. https://doi.org/10.1016/j.jvcir.2021.103371. [5] Li J, Zhu K, Pan L (2022). Wrist and finger motion recognition via M-mode ultrasound signal: A feasibility study. Biomedical Signal Processing and Control, 71, pp. 1-11. https://doi.org/10.1016/j.bspc.2021.103112. [6] Athavale V, Kumar D, Gupta S (2021). Human Action Recognition Using CNN-SVM Model. Advances in Science and Technology, 105, pp. 282- 290. https://doi.org/10.4028/www.scientific.net/AST.105 .282. [7] Ji Y, Yang Y, Shen F, Shen H, Zheng WS (2020). Arbitrary-view Human Action Recognition: A Varying-view RGB-D Action Dataset. IEEE Transactions on Circuits and Systems for Video Technology, 31, pp. 289-300. https://doi.org/10.1109/TCSVT.2020.2975845. [8] Li G, Li C (2020). Learning skeleton information for human action analysis using Kinect. Signal Processing Image Communication, 84, pp. 1-5. https://doi.org/10.1016/j.image.2020.115814. [9] Bueno J, Maktoobi S, Froehly L, Fischer I, Jacquot M, Larger L, Brunner D (2018). Reinforcement Learning in a large scale photonic Recurrent Neural Network. Optica, 5, pp. 1-5. https://doi.org/10.1364/OPTICA.5.000756. [10] Huang Y, Bai C, Li H, Zhang J, Chen S (2020). Image Captioning Based on Conditional Generative Adversarial Nets. Journal of Computer-Aided Design & Computer Graphics, 32, pp. 911-918. https://doi.org/10.3724/SP.J.1089.2020.18003. [11] Kumar J, Goomer R, Singh AK (2018). Long Short Term Memory Recurrent Neural Network (LSTM- RNN) Based Workload Forecasting Model For Cloud Datacenters. Procedia Computer Science, 125, pp. 676-682. https://doi.org/10.1016/j.procs.2017.12.087. [12] Zang Y, Yu Z, Xu K, Chen M, Yang S, Chen H (2023). Fiber communication receiver models based on the multi-head attention mechanism. Chinese Optics Letters, 21, pp. 1-6. https://doi.org/10.3788/COL202321.030602. [13] Gharahdaghi A, Razzazi F, Amini A (2021). A non- linear mapping representing human action recognition under missing modality problem in video data. Measurement, 186, pp. 1-10. https://doi.org/10.1016/j.measurement.2021.110123. [14] Veeriah V, Zhuang N, Qi GJ (2015). Differential Recurrent Neural Networks for Action Recognition. 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, Santiago, Chile, pp. 4041- 4049, https://doi.org/ 10.1109/ICCV.2015.460. [15] Vemulapalli R, Arrate F, Chellappa R (2014). Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group. 2014 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Columbus, OH, USA, pp. 588- 595, https://doi.org/ 10.1109/CVPR.2014.82. [16] Liu J, Shahroudy A, Xu D, Kot AC, Wang G (2017). Skeleton-Based Action Recognition Using Spatio- Temporal LSTM Network with Trust Gates. IEEE Transactions on Pattern Analysis & Machine Intelligence, 40, pp. 3007-3021. https://doi.org/10.1109/TPAMI.2017.2771306.