https://doi.org/10.31449/inf.v48i8.5781 Informatica 48 (2024) 17–34 17 Basketball Fixed-point Shooting Hit Prediction Based on Human Pose Estimation Algorithm Xi Li 1 , Jiao Hua 2 1 Department of Physical Education, Wuxi Taihu University, Wuxi 214000, China 2 Physical Education Group, Wuxi Yangming Center Primary School, Wuxi 214000, China E-mail: 000058@wxu.edu.cn, tiandiboy2000@163.com Keywords: artificial intelligence, human pose estimation algorithm, fixed-point shooting, object detection, you only look once version 5 Received: February 28, 2024 As computer vision and artificial intelligence develop, the research on basketball fixed-point shooting hit prediction based on human pose estimation algorithm becomes a topic of great concern. To construct a basketball fixed-point shooting hit prediction model, a new object detection algorithm was designed, and the You Only Look Once version 5 algorithm was optimized on the basis of GIoU loss function and Convolutional Block Attention Module. Then, a new human pose estimation algorithm was designed based on the OpenPose algorithm. Results showed that the average accuracy of the improved YOLOv5 algorithm reached 95.34% when the number of iterations was 50. In the comparison among improved OpenPose and other algorithms, improved OpenPose performed better, with a recall rate of 96.23%, an accuracy of 87.16%, a precision of 89.75%, and an F1 value of 88.19%. In the comparison with other models, the area of receiver operating characteristic curve was the largest, reaching 0.974, and the F1 value, accuracy and recall rate of the research model were the highest, reaching 95.54%, 96.39% and 98.25%, respectively. Results show that it effectively predicts the shoot hit of basketball fixed-point shooting, which provides a useful reference for tactical analysis and player performance evaluation in basketball games. Povzetek: Raziskava predstavlja napovedni model za zadetke pri košarkarskem metu s fiksne točke na osnovi algoritma za ocenjevanje človeške drže, kar je koristno za taktično analizo in ocenjevanje igralcev v košarki. 1 Introduction Basketball is a competitive sport that combines speed, strength, skill and tactics and requires a high degree of skill and strategic awareness. In basketball, fixed-point shooting is a common means of scoring, and predicting the shooting rate is of great significance for the team's tactical layout and the improvement of players' skills [1]. Human pose estimation is the recognition and analysis of human pose information in images or videos through computer vision technology [2]. Through human pose estimation, key information such as the player's body posture, action characteristics, and movement trajectory during the shooting process can be obtained [3, 4]. This information can be further used to analyze the relationship between a player's skill level and shooting rate [5]. The purpose of this study is to use the human pose estimation algorithm to predict the hit rate of basketball fixed-point shooting. In view of this, this study analyzes the posture and action characteristics of players during the shooting, and establishes a prediction model using machine learning. The significance of the study is that by predicting the shooting hit rate, coaches and players can adjust their shooting strategies to improve the winning rate of the game. In addition, it also provides new ideas and methods to apply computer vision and artificial intelligence in sports competition. The research content includes four parts. The first part is a detailed introduction of the human pose estimation algorithm and the fixed-point shooting hit prediction model, and the second part is the You Only Look Once version 5 (YOLOv5) algorithm using object detection, and its optimization. Then, based on the OpenPose algorithm, a new human pose estimation algorithm is designed. Finally, a basketball fixed-point shooting hit prediction system is designed. The third part verifies the validity and reliability of the research model through experimental design and data analysis. The fourth part summarizes and prospects the research content. 2 Related works Human pose estimation is important in computer vision in recent decades, and it plays a crucial role in understanding people in images and videos. At present, there have been many studies on human pose estimation. Dubey and Dixit reviewed the key research and recent 18 Informatica 48 (2024) 17–34 X. Li et al. advances in human pose estimation, including 2D and 3D pose estimation techniques and their traditional and deep learning methods. The results showed that these different pose assessment methods effectively improved the accuracy and efficiency of pose prediction [6]. Liu et al. proposed a lightweight pose estimation network using the polarimetric self-attention mechanism. Firstly, ghost convolution reduced feature extraction network parameters. Secondly, polarimetric self-attention module solved the pixel-level regression task, reduce the insufficient feature extraction caused by parameter reduction, and improve human key point regression accuracy. Finally, a new coordinate decoding method reduced the error during heat map decoding and improve the accuracy of key point regression. Results showed that it reduced model parameters while ensuring a small loss of accuracy [7]. Qin et al. proposed a lightweight human pose estimation network and named it CVC-Net, aiming to reduce the complexity and improve human pose detection speed. CVC-Net was based on the stacked hourglass network architecture, which used Res2Net_depth residual blocks to reduce the parameters, and combined channel attention mechanism and PixelShuffle upsampling methods to optimize performance. Results showed that CVC-Net significantly reduced the model parameters while maintaining high accuracy, which was especially suitable for devices with limited computing power [8]. Xu et al. proposed a novel network called multi-scale position augmentation network to improve small and medium-scale key point detection and semantic confusion discrimination in human pose estimation. The network adopted a multi-scale adaptive fusion unit and a position enhancement module to emphasize the real joint position characteristics and improve the detection accuracy. Experimental results showed that network performance was significantly improved in pose estimation tasks, and results were more accurate and reliable [9]. Jiang proposed a deep learning-based basketball game monitoring system to collect data non-invasively to identify player behavior during the game. The system used a video frame to analyze the rebounding situation and predict the player's position to get the rebound. In addition, the position of the player was determined by traditional regression techniques and their movement towards the point of landing. Simulation analysis of the feasibility, performance, and system efficiency demonstrated the reliability of the framework [10]. Özkan studied the electrophysiological basis of predicting free throw hits in wheelchair basketball players. Their predictions of hitting or missing free throws were observed by conducting EEG tests on semi-professional wheelchair basketball players and non-professionals. The results of the study showed that expert players exhibited significant negative amplitude in the 100 milliseconds before the release of the free throw, and this electrophysiological response was regarded as a valid indicator for predicting the effect of the action [11]. Siemon and Jörn used Twitter data mining technology to predict the performance of NBA basketball players. The study conducted automated personality mining on the tweets of 185 professional players and collected their top five personality traits and player statistics. Correlation and multiple linear regression analyses found that personality traits such as extraversion, agreeableness, and conscientiousness were associated with basketball performance and was used to predict future performance [12]. Naik and Hashmi discussed the ability to predict the trajectory of dynamic objects in a dynamic sports environment, especially in basketball, and proposed a dual-mode exponential normal distribution processing method with relational network to accurately predict the trajectory of basketball. Results showed a good shooting status prediction of athletes [13]. The results of the literature survey are shown in Table 1. Table 1: Results of the literature survey Literature number Accuracy (%) Recall (%) Advantage Insufficient [6] / / Summarize the current progress in pose estimation research / [7] 84 94 Reduce in the algorithm parameters The algorithm loss is not effectively controlled [8] 78 94 Improve the applicability of the model The model accuracy is reduced [9] 91 95 The algorithm performance has improved significantly The model increases [10] 89 92 Can accurately predict the location of the The model calculation is more Basketball Fixed-point Shooting Hit Prediction Based on Human… Informatica 48 (2024) 17–34 19 athletes time-consuming [11] / / New indicators of action prediction are found No method is proposed according to the new indicators [12] / / Found potential associations of personality traits with basketball performance / [13] 84 96 Players' shooting status can be predicted according to the basketball movement trajectory The model calculation is complicated and less efficient In summary, significant progress was made in human pose estimation, especially with the support of deep learning. From Dubey and Dixit's [6] review to specific algorithm innovations such as Liu et al.'s [7] lightweight network, Qin et al.'s [8] CVC-Net, and Xu et al.'s [9] multi-scale position enhancement network, these studies have demonstrated efforts to improve estimation accuracy and computational efficiency. In addition, Jiang and Özkan [10, 11] showed the practical application value of the technique by applying the human pose estimation technique to specific scenarios, such as basketball motion monitoring and wheelchair basketball player behavior prediction. These studies have improved the accuracy and efficiency of pose estimation and expanded its application in sports, medicine, and other fields. In view of this, this paper constructs a basketball fixed-point shooting hit prediction model based on the human pose estimation algorithm. 3 Construction of basketball fixed-point shooting hit prediction model based on human pose estimation algorithm To construct a basketball fixed-point shooting hit prediction model, this study first introduced the YOLOv5 algorithm based on object detection and introduced the GIoU loss function to optimize YOLOv5 algorithm. Combined with Convolutional Block Attention Module (CBAM) attention mechanism, environment interference on object detection was reduced. Then, a new human pose estimation algorithm was designed based on the OpenPose algorithm. Finally, a basketball fixed-point shooting hit prediction system was designed. 3.1 Basketball object detection method using deep learning In sports, human object detection can identify athletes on the field, and even can detect and identify referees and spectators. In the competition, the performance and technique of athletes are judged, and a technology that can track athletes and detect and identify athletes' movements is needed. YOLOv5 is an object detection algorithm with fast speed, high accuracy and strong real-time performance. YOLOv5 is more accurate in detecting small objects and can adapt to a variety of different scenarios and mission requirements [14]. Compared with other object detection algorithms, YOLOv5 is more concise, efficient, stable, and easy to expand and optimize [15-17]. In view of this, the YOLOv5 network is used to detect fixed-point shooting in basketball games, hoping to complete the detection and positioning of basketball players and further realize the posture recognition of athletes, as shown in Figure 1. 20 Informatica 48 (2024) 17–34 X. Li et al. BackBone Neck Prediction Focus CBL CSP1-1 CBL CSP1-3 CBL CSP1-3 CBL SPP CSP2-1 CBL Up sample Concat CSP2-1 CBL Up sample Concat Concat Concat CSP2-1 CSP2-1 CSP2-1 Conv Conv Conv CBL CBL 20×20×21 40×40×21 80×80×21 Figure 1: YOLOv5 network architecture In Figure 1, YOLOv5 is an object detection algorithm including backbone, neck, and prediction structure. In the backbone structure, it uses a feature pyramid structure for feature extraction. This structure can effectively fuse shallow features and deep features and detect targets of different sizes [18, 19]. In addition, YOLOv5 also uses lightweight convolutional neural networks as backbones, such as MobileNetV3, to further reduce network complexity and improve the running speed. In the neck structure, YOLOv5 adopts a top-down path fusion structure to shorten low-level feature flow path to the prediction layer. This structure can effectively reduce computing and improve the efficiency of network operation. In the prediction structure, YOLOv5 integrates the features, and fuses different low-level features with three paths: prediction1, prediction2 and prediction3, respectively, and outputs the defect target bounding box information and category information. This design can improve the network detection accuracy and robustness against targets of different sizes and attitudes [20]. In this study, the GIoU loss function optimizes the traditional YOLOv5, and IoU loss function expression used by the traditional YOLOv5 algorithm is shown in equation (1). 1 ( , ) IoU L IoU A B =− (1) In equation (1), if the target box and the prediction box do not intersect, IoU’s value is 0, but if there is no intersection between the two boxes, the relationship between the two boxes cannot be measured, and the training and learning cannot be carried out, so that the regression effect cannot be evaluated [21]. Therefore, the GIoU function can solve the gradient problem due to the disjoint of two boxes, which is expressed as shown in equation (2). || 1 ( , ) || GIoU C A B L IoU A B C − = − + (2) In equation (2), GIoU pays attention to other blank areas in addition to the overlap between the target box and the prediction frame, which can better reflect the overlap [22]. However, to improve the convergence speed, a penalty term is added, as shown in equation (3). 2 2 ( , ) 1 ( , ) cc DIoU AB L IoU A B l  = − + (3) In equation (3), c A and c B represent the center point of the prediction and the target box, and l represents the diagonal distance in the minimum area of the target box and the prediction box, the convergence speed can be optimized if the two do not coincide [23]. The prediction box aspect ratio is taken into account, as shown in equation (4). 2 2 ( , ) 1 ( , ) cc CIoU AB L IoU A B v l   = − + + (4) In equation (4),  represents the weight function. The expression of v is shown in equation (5). 2 2 4 (arctan arctan ) AB AB ww v hh  =− (5) In equation (5), v represents the similarity between the detection frame and the target frame. The purpose of this penalty term is to quickly complete the approximation of the length and width of the prediction box to the target box [24]. Additionally, to improve detection accuracy, CBAM is introduced, which is expected to reduce the environment interference on target detection. CBAM structure is shown in Figure 2. Basketball Fixed-point Shooting Hit Prediction Based on Human… Informatica 48 (2024) 17–34 21 Input Feature Refined Feature Channel Attention Module Spatial Attention Module Figure 2: CBAM attention mechanism structure In Figure 2, CBAM is an attention model using convolutional neural network, which realizes the weighted fusion and selection of input features by applying the self-attention mechanism and the channel attention mechanism on the feature map, to improve object detection accuracy and efficiency. The attention mechanism of CBAM consists of multiple sub-modules, such as spatial and channel attention module [25]. After feature map is input, weight attention map is generated through the two modules, and then weight values in the graph are multiplied by the values for the prominent feature map. Channel attention module is shown in Figure 3. Channel Attention Module Input Feature F MaxPool AvgPool Shared MLP Channel Attention M C Figure 3: Channel attention module Figure 3 illustrates a technique enhancing the performance of convolutional neural networks. In CBAM, channel attention mechanism first performs the maximum pooling and average pooling operations in the spatial dimension respectively to obtain the maximum and average values. Then, two vectors are connected and added together. Finally, the results are mapped to [0,1] through the sigmoid function to obtain channel attention vector. This vector can be used to weight the input feature map on important features. Spatial attention module is shown in Figure 4. 22 Informatica 48 (2024) 17–34 X. Li et al. Spatial Attention Module Channe-Refined Feature F ’ Spatial Attention M S [MaxPool,AvgPool ] Figure 4: Spatial attention module In Figure 4, two two-dimensional spatial feature maps are generated by pooling input feature map maximum value and the average value in channel dimension through spatial attention mechanism. Two spatial maps are spliced together according to the channels to form a new feature map with 2 channels. Then, a convolutional layer is used to reduce feature map dimensionality to generate a one-dimensional spatial attention map [26]. Finally, sigmoid function is used to process the spatial attention map to obtain final weight. Attention mechanism introduction on the basketball court can make the recognition algorithm cover more details of the player's body, and can improve the attention of the recognition algorithm to key information, so as to achieve a higher accuracy of the player's fixed-point shooting. 3.2 Construction and system design of basketball fixed-point shooting prediction model Based on the object detection of basketball players, the pose estimation of the human body will be further studied, and the basketball fixed-point shooting hit prediction model will be constructed. OpenPose algorithm is a method based on deep learning, developed by Carnegie Mellon University in the United States. It can detect the joint points of all people in an image or video and connect these joint points to form a skeleton map of the human body. This algorithm has excellent robustness, is suitable for single and multi-person people and various scenarios like behavior recognition [27]. In view of this, based on the OpenPose algorithm, a trajectory optimization recognition method is proposed, which detects and recognizes the posture of basketball players, and finally uses the support vector machine algorithm for classification. Firstly, 18 human body joint points are selected as the output to predict the pose of the basketball player, and the labeling result of OpenPose is shown in Figure 5. 1 2 3 4 0 5 6 7 8 9 10 11 12 13 14 15 16 17 Angle between upper arm and body Figure 5: OpenPose joint output points Basketball Fixed-point Shooting Hit Prediction Based on Human… Informatica 48 (2024) 17–34 23 Figure 5 shows all the object data in the skeletal key annotation, where the skeletal keys are represented as an array of 2×K in length, where K is the total number of skeletal keys defined by that category. These 18 skeletal keys are connected by 19 connecting lines that form the torso, and these skeletal keys correspond to specific locations in the joints of the human body, including the nose, ears, elbows, eyes, wrists, shoulders, knees, hips, and ankles. Figure 6 shows the OpenPose network structure. Stage1 VGG19 Convs Convs Loss-pcm Pcm Paf Loss-paf Convs Convs Loss-pcm Pcm Paf Loss-paf ... Stage2 Figure 6: OpenPose network architecture diagram In Figure 6, the VGG19 backbone network extracts image features, which are then passed through a series of stage modules. Each module has the same structure and function. These modules include two branches: one generates PCM, and the other generates PAF. Loss is calculated for PCM and PAF at each stage. While the first stage can theoretically output complete information, multiple stages are utilized in practice. This is because there is semantic information shared between key points, and later stages can optimize detection results based on the information extracted from previous stages [28]. Each network branch can be iterated, with the next branch using information from the previous stage as input to make predictions, as shown in equation (6). 11 11 ( , , ), 2 ( , , ), 2 t t t t t t t t S F S L t L F S L t   −− −−  =     =     (6) In equation (6),  and  represent the network branches, which outputs a joint connection matching degree vector field 1 2 3 ( , , ... ) m L L L L L = and a key point prediction confidence graph 1 2 3 ( , , ... ) j S S S S S = . The loss function of the two branch networks in the t stage is shown in equation (7). 2 2 2 12 ( ). || ( ) ( ) || ( ). || ( ) ( ) || t J t S j P j j t M t L m P m m f W P S p S p f W P L p L p   = =−   =−     (7) In equation (7), j S  represents the reliability diagram of the athlete's body position in the real data, m L  represents the affinity vector field, and W represents the base mask to avoid false penalties in special cases. To avoid gradient’s disappearance, it is supplemented at each stage, as shown in equation (8). 1 () T t t t S L f f f = =+  (8) Equation (8) represents the final objective function. The human pose estimation method is prone to recognition errors, and the background and occlusion are regarded as joint nodes, so the misidentified joint points need to be repaired, and the expression is shown in equation (9). 12 ( , ) i i i y d k k x =  (9) In equation (9), 1 k and 2 k represent the gesture of adjacent frames, 1 i B and 2 i B represent the extracted bounding box of the body part. The feature points extracted from 1 i B are i x and the feature points extracted from 2 i B are i y . The similarity between the previous frame pose of the body and the current pose is expressed by equation (10). ,2 (1 )*|| || i i g h g h i n Sc H H m  =  + − −  (10) In equation (10), i m represents the number of feature points of the i joint point in the g frame, and i n represents the number of feature points of the i joint point in the h frame. When the similarity is higher than the threshold set by the threshold, the joint points in the previous sequence are used as candidates, and if the similarity is lower than the set threshold, the joint point data of this frame is cleared. This study is carried out on the basis of the human posture recognition method to identify joint points position, and such methods need to record the body shape information of basketball players to avoid the inconsistency between the body shape information and the extracted feature points, resulting in the instability of the prediction model. In view of this, this study analyzes the angle change of athletes in the 24 Informatica 48 (2024) 17–34 X. Li et al. process of fixed-point shooting to improve the prediction accuracy. In the process of shooting, the arm plays an important role, so the right wrist, elbow, and shoulder are used as the main feature information, and the coordinates of their joint points are expressed as 0 0 1 1 2 2 ( , ), ( , ), ( , ) Rt x y Rb x y Rm x y and the vector representation of the right forearm and right arm is shown in equation (11). 1 0 1 0 1 2 2 1 2 1 ( , ), ( , ) l x x y y l x x y y = − − = − − (11) In equation (11), 1 l represents the right forearm vector and 2 l represents the right arm vector. The angle between two vectors is denoted by equation (12). 1 1. 2 cos ( ) || 1||| 2 || ll ll  − = (12) In equation (12),  represents the angle of the right arm. When the joint point of the hand is at the highest point, the angle between the small arm and the torso of the large arm reach the maximum, and the angle feature data is input into the classifier for prediction. The problem with the fixed-point shooting results is a classification problem, classified by hits or misses, and the study will employ a support vector machine algorithm. Support vector machines excel when working with complex datasets, especially for high-dimensional and large-scale data. Categorical learning can find a hyperplane, and the hyperplane that the sample divides in space is shown in equation (13). 0 T xb  += (13) In equation (13), the 1 2 3 ( , , ... ) n x x x x x = represents angular feature data of the athlete's limb is denoted with a dimension of 7, 1 2 3 ( , , ... ) n      = represents the normal vector in the hyperplane, and b represents the bias term. The distance between the sample and the hyperplane in each shooting pose sample is shown by equation (14). || || || T xb l   + = (14) In equation (14), l represents the distance between any sample X and the hyperplane. If the classification is correct, then equation (15) exists. 1, 1 1, 1 T ii T ii x b y x b y    +  = +   +  − = −   (15) In equation (15), when a fixed-point shooting is hit, the specimen will be above the superplane and vice versa. The basic expression of the support vector machine is shown in equation (16). 2 , 1 min || || 2 . . ( ) 1, 1,2,... b T ii s t y x b i n        +  + =  (16) In equation (16), SVM satisfies the points in the sample set in equation (15), and the distance from these points to the hyperplane is the spacing. In this study, a prediction system will be designed based on the fixed-point shooting prediction model, and the system will be designed and implemented according to the system demand analysis. The main goal is to apply the object detection and pose estimation algorithms in the process of athletes' fixed-point shooting [29]. The system can be applied not only to the team, but also to individual basketball training to help athletes improve their set shooting skills. In addition, non-functional requirements such as operability, reliability, scalability, ease of maintenance, ease of use, and security of hardware devices need to be considered. The system adopts the B/S architecture to build the system, and its framework composition is shown in Figure 7. Browser Side Application server side Database module side Front end interaction layer Business layer Database Management System Prediction of shooting probability Resource layer Storage Athlete testing Connecting to external servers Training weights Video data Athlete information File system Player pose estimation Prediction of hit probability GPU server side Figure 7: Framework diagram of fixed-point shooting probability prediction system Basketball Fixed-point Shooting Hit Prediction Based on Human… Informatica 48 (2024) 17–34 25 The framework in Figure 7 is mainly composed of an application service module, a browser module, a database module and a GPU server module, wherein the browser module contains a front-end interaction layer, which is mainly a fixed-point shooting probability prediction system. The fixed-point shooting hit prediction system is mainly composed of information input, result prediction, and recording prediction results, which can be browsed by users. The application server includes a business, a resource, and a storage layer [30]. The resource layer contains predictive training weights, shooting video data, and athlete information. The database module is used to store the prediction results. 4 Model performance evaluation and discussion of basketball fixed-point shooting hit prediction model To verify the applicability and superiority of the prediction model, the performance was first tested, then the performance of the human pose estimation recognition algorithm was tested, and then prediction accuracy was verified. Finally, the functional module of fixed-point shooting prediction system was tested. 4.1 Target detection and target tracking performance test In chapter 3, a basketball fixed-point shooting hit prediction algorithm based on human pose estimation algorithm was constructed. To verify the feasibility of the algorithm, a simulation experiment environment was built to analyze the algorithm. The simulation experimental environment of the research construction was based on the laboratory basic equipment, and the detailed information is shown in Table 2. Table 2: Experimental environment setting Hardware configuration Software configuration CPU AMD Ryzen 9 5950X Operating system CentOS 7 GPU NVIDIA RTX3090 Programming environment Python 3.8 RAM Corsair Vengeance LPX 32GB (2 x 16GB) DDR4 3200 Simulation software MATLAB R2021a Storage device Samsung 970 EVO Plus 1TB NVMe M.2 Internal SSD Data set NBA Player Movement Data In the experiment, the NBA Player Movement Data was used as the experimental training and detection data set. The data was provided by NBA officials, including five types of basic information: timestamp, player position, position of the ball, player identity information and game information. The content of this data set would be constantly updated with the competition, in which there were about 50000 data related to the player posture, 30000 data as the network training data set and 20000 data as the network test set. YOLOv5 is a common deep learning object detection algorithm, and the human pose estimation algorithm is designed as the core. In the simulation experiment, the YOLOv5 deep neural network built was a deep learning network with three layers, and the specific parameters are shown in Table 3. Table 3: Parameter setting of the YOLOv5 algorithm Name Value Name Value Input image size 640 Pre data augmentation True Batch size 16 Anchor box Automatically match datasets Learning rate 0.01 Loss function GIoU Weight decay 0.005 Confidence threshold 0.25 Optimizer Adam Non maximum suppression threshold 0.45 Learning rate scheduler Cosine LR schedule Iterations 300 26 Informatica 48 (2024) 17–34 X. Li et al. In the constructed YOLOv5 network, the lightweight convolutional neural network was used as the backbone, and it extracted the character action characteristics of the input image through the pyramid feature network structure. When extracting the action features, the high-level features of the image were combined with the low-level features, and the high-level semantic information and task action details of the image were extracted, which could effectively improve the extraction ability of the characters. In the above experimental environment, the model was analyzed with the parameters set in Table 3, and the results of the model are shown in Figure 8. Shooter 0.28 Shooter 0.90 Shooter 0.87 (a) Before improvement (c) Before improvement (d) After improvement (b) After improvement Figure 8: Detection results of the model before and after improvement Figure 8 (a) and Figure 8 (b) represent the detection and identification results before and after the improvement of the YOLOv5 method, respectively, and the green boxes represent the detected athletes. The confidence level before the improvement was low, and the recognition effect was not significant. In motion, the basketball player in motion in the second frame was not detected by the previous method, while the improved method clearly captured the player's body. Figure 8(c) and Figure 8(d) illustrate the dynamic character tracking effect before and after the improvement of the YOLOv5 method, respectively. In Figure 8(c), one basketball player was missed due to the visual overlap of the two basketball players, while in Figure 8(d), the partially occluded basketball player was still detected. Results showed that the improvement effect was obvious. To further verify improved research method’s superiority, it was compared with YOLOv5 and YOLOv5+DloU methods, and the loss value and average accuracy were compared in Figure 9. Basketball Fixed-point Shooting Hit Prediction Based on Human… Informatica 48 (2024) 17–34 27 (a) Loss curve of AC/DC ratio 0 20 40 60 80 0.00 0.02 0.04 0.08 Box-loss YOLOv5 YOLOv5+DloU YOLOv5+CloU+CBAM 0 20 40 60 80 0.00 0.01 0.02 0.03 (b) Target detection loss curve Obj-loss Iteration number Iteration number 0.04 0.06 0.10 (c) Loss curve of AC/DC ratio 0 20 40 60 80 0.00 0.20 0.40 0.80 Average accuracy Iteration number 0.60 1.00 YOLOv5 YOLOv5+DloU YOLOv5+CloU+CBAM YOLOv5 YOLOv5+DloU YOLOv5+CloU+CBAM Figure 9: Loss curve and average accuracy curve Figure 9 (a) and Figure 9 (b) represent the intersection and union ratio loss curve and the object detection loss curve, respectively. The abscissa counts iterations, and the ordinate represents the matching and detection loss of the object and the prediction frame, respectively. From the graph, the improved convergence speed was faster and the loss value was lower. Figure 9(c) represents the average accuracy curve which increased as iterations increased. When iteration of the research method was 50, the average accuracy tended to be stable, which was 95.34%. Compared with the two types of methods before the improvement, the research method fluctuation amplitude was significantly smaller, and the average accuracy was improved, which verified the rationality of the model improvement. Continuing to further compare these three methods, they were applied to the four-basketball game video Q, W, E, and R in the dataset, the changes in the recall rate are shown in Figure 10. 28 Informatica 48 (2024) 17–34 X. Li et al. 1.00 0.95 0.90 0.85 0.80 0.75 10 15 50 100 125 200 F1 Method (b) W YOLOv5+C loU+CBAM YOLOv5 YOLOv5+ DloU Video frames 1.00 0.95 0.90 0.85 0.80 0.75 10 15 50 100 125 200 F1 Method (a) Q YOLOv5+C loU+CBAM YOLOv5 YOLOv5+ DloU Video frames 1.00 0.95 0.90 0.85 0.80 0.75 10 15 50 100 125 200 F1 Method (d) R YOLOv5+C loU+CBAM YOLOv5 YOLOv5+ DloU Video frames 1.00 0.95 0.90 0.85 0.80 0.75 10 15 50 100 125 200 F1 Method (c) E YOLOv5+C loU+CBAM YOLOv5 YOLOv5+ DloU Video frames Figure10: Comparison chart of recall rate of three methods In Figure 10, the abscissa represents the method category and the frames of the video, and six frames in the four videos were selected for analysis. The frame numbers were 10, 15, 50, 100, 125, and 200, respectively. The recall rates of Figure 10(c) were the lowest, which may be due to the fact that there are more spectators in the basketball game of E-video, and there was a certain occlusion of basketball players, but the research method still showed a high recall rate in Figure 10(c). The recall rate of the improved method was highest before the improvement, and the average recall rates in the four videos of Q, W, E, and R were 97.73%, 96.72%, 95.34%, and 97.98%, respectively. 4.2 Performance test of human posture algorithm To test the human posture algorithm, the key frames in the fixed-point shooting process of basketball players were used as the input features of limb angles, and the precision, recall, F1 value and accuracy were selected as the evaluation indexes. The improved OpenPose method was compared with the evaluation indexes of OpenPose algorithm, PoseNet algorithm and Hourglass network algorithm in Table 4. Table 4: Comparison of evaluation indicators using four methods Method Prediction Recall Precision Accuracy F1 value OpenPose In 85.45% 72.31% 81.23% 79.51% 0ut 76.27% 88.14% 81.34% 84.31% PoseNet In 84.48% 75.36% 83.45% 79.56% 0ut 77.23% 87.24% 87.63% 80.94% Hourglass In 88.45% 81.24% 86.35% 83.18% 0ut 82.57% 88.73% 83.68% 85.05% Improved OpenPose In 96.23% 87.16% 89.75% 88.19% 0ut 85.13% 95.72% 90.86% 89.53% In Table 4, the improved effect was better, and all indicators were better than the improved OpenPose algorithm, with a recall rate of 96.23% for hit prediction and 85.13% for miss prediction, 87.16% for hit prediction and 95.72% accuracy for miss prediction, and 89.75% accuracy for hit prediction and 90.86% accuracy for miss prediction after improvement. The improved F1 value for hit prediction was 88.19% and 89.53% for miss prediction. PoseNet predicted the same as the OpenPose method, and Hourglass predicted better than them. The improved research method was superior, which illustrated the superiority of the research method. The experiment observed that different limb angle characteristics would affect the shooting percentage, so the study analyzed the effect of different characteristics on the prediction rate, as shown in Figure 11. Basketball Fixed-point Shooting Hit Prediction Based on Human… Informatica 48 (2024) 17–34 29 60 30 0 6 9 Prediction accuracy 90 3 Accuracy impact Angle between upper and lower arms Angle between thighs and calves Angle between upper arm and body (a) The impact of the characteristics of various parts of the limbs on accuracy (b) Key angle features Angle between thighs and calves Angle between upper arm and body Shooter's forearm angle Angle between calves and feet Body thigh angle Angle between the upper and lower arms of the auxiliary hand Arm acceleration Figure 11: The impact of different features on prediction rate Figure 11(a) shows the characteristics effect of each part of the limb on the accuracy, and it can be seen that the calf foot angle and arm acceleration had the least impact on the accuracy, and their effect on the accuracy was within 1%. The limb characteristics that greatly influenced the accuracy of fixed-point shooting prediction were the angle between the shooting arm and the lower arm and the angle between the thigh and calf, reaching 7% to 8%. The second was the angle between the upper arm and the lower arm of the auxiliary hand, and the influence rate reached 6%. In the training process, if athletes want to improve the accuracy of fixed-point shooting, athletes need to pay attention to these three characteristics and control the size of the angle. Figure 11(b) shows the three types of limb features that have a greater impact on accuracy during shooting, and the angles of these three types of limb features can be clearly seen. In summary, results not only showed a high accuracy of the research method, and performance of the research design human posture algorithm is superior, but also showed that the research method can point out the key characteristics of fixed-point shooting and provide suggestions for the improvement of basketball players' shooting training. 4.3 Performance test of fixed-point shooting prediction model To verify research model superiority, Receiver Operating Characteristic (ROC) curve compared the predictive models: Extreme Gradient Boosting (XGBoost) model, Bagging model, and K-Nearest Neighbors (KNN) model. The comparison results are shown in Figure 12. 0 0.2 0.4 0.6 0.8 0.1 0.7 0.5 0.3 0.9 Sensitivity (TPR) 1.0 0 0.2 0.4 0.6 0.8 0.1 0.7 0.5 0.3 0.9 1.0 Specificity (FPR) KNN (AUC=0.902) Research method (AUC=0.974) XGBoost (AUC=0.826) Bagging (AUC=0.793) Figure12: Comparison of ROC curves for four methods 30 Informatica 48 (2024) 17–34 X. Li et al. In Figure 12, ROC area of the indicator was between 0.1 and 1, which can intuitively evaluate the model accuracy, and the larger the Area Under Curve (AUC) value, the higher the model accuracy. The AUC value of the research model was the largest, reaching 0.974, which was very close to 1. The second was KNN, which had an AUC value of 0.902 and a higher accuracy. The AUC values of the remaining models ranged from 0.70 to 0.85, with average accuracy. The dataset was divided into five parts, and the F1 values of the four models were compared with the accuracy and recall rates, as shown in Figure 13. 100.00 90.00 80.00 70.00 60.00 4 2 1 F1/% 5 3 Bagging XGBoost Data set Method KNN Research method (a) F1 value change (b) Accuracy change (c) Recall change 100.00 90.00 80.00 70.00 60.00 4 2 1 Accuracy/ % 5 3 Bagging XGBoost Data set Method KNN 100.00 90.00 80.00 70.00 60.00 4 2 1 Recall/% 5 3 Bagging XGBoost Data set Method KNN Research method Research method Figure 13: Comparison of F1 value, accuracy and recall rate of four models In Figure 13, the F1 value, accuracy and recall of the research model were the highest. Specifically, the average values of the research model on the dataset were 95.54%, 96.39%, and 98.25%, which indicated that its performance was excellent. The performance from good to poor was: research model, KNN model, XGBoost model, and Bagging model. This once again proved the superiority of the research model in the prediction task. To verify the applicability of the fixed-point shooting prediction system of the study design, several modules of the system were tested and the browser results were displayed, as shown in Figure 14. (a) Targeted Shooter Testing Template Informati on entry Basketball player testing Hit prediction Save the Results Shooter 0.92 Save the Results In (b) Prediction Results Module Figure 14: Performance testing of fixed-point shooting hit prediction system Figure 14 (a) is the display of the shooting athlete detection module. There were options for information entry, athlete detection, hit prediction, and saving results in this interface, and the athlete prediction was marked in Basketball Fixed-point Shooting Hit Prediction Based on Human… Informatica 48 (2024) 17–34 31 red, which means that this option is selected, and the athlete in the picture was marked by a green box, and showed that the shooting hit rate was 0.92. Figure 14(b) shows the prediction results module. Because the prediction hit rate was 0.92, which is close to 1, and the system recognized this as a hit. The experimental results showed that the fixed-point shooting prediction system of the research design could operate normally, and the user could predict and save the results of the athlete's fixed-point shooting. The module is clear and easy to understand, which is extremely user-friendly. 4.4 Practical application and robustness analysis of the model To verify the robustness of the study constructed model, two test experiments were designed for full system memory load and abnormal input, and the results are shown in Figure 15. 500 700 600 400 300 200 100 80 85 90 95 100 Accuracy/% Number of samples/piece Normal Abnormal input Memory full load Figure 15: The robustness analysis of the algorithm From Figure 15, the accuracy of this model was still in a steady state even in the environment of memory high pressure or abnormal data input. In the normal state, the highest accuracy of the model prediction could reach 98%. When there was abnormal data input, the accuracy of the model decreased, but it was still higher than 90%, and the highest accuracy could also reach 96%. The full load environment of the system had no effect on the model, and the overall prediction accuracy was about 95%. Gate Control Loop Unit (GRU) and Support Vector Machine (SVM) are the commonly used prediction algorithm, to compare the analysis of the algorithm in practical application, research in different games, using different shooting prediction algorithm, with the athletes shooting accuracy prediction analysis results as shown in Table 5. Table 5: The practical application analysis of pose recognition field hit prediction algorithm Game time Proposed method GRU SVM Time (s) Accuracy (%) Time (s) Accuracy (%) Time (s) Accuracy (%) 1 1.5 95 3.1 88 3.3 84 2 1.2 94 2.5 87 3.1 85 3 1.3 93 2.6 85 3.2 86 4 1.4 95 2.8 89 3.5 89 5 1.3 96 2.8 85 3.4 88 In Table 5, whether in that game, the prediction accuracy of the proposed algorithm was always above 90% and reached 96%, while the prediction accuracy of GRU and SVM was only 89%. From the time-consuming comparison that the algorithm proposed by the study could give the prediction results within 2s, which greatly improved the prediction of the direction of the competition results, while both GRU and SVM algorithms needed a certain time to give the prediction results. 4.5 Discussion Basketball is a ball game widely distributed in the world. This sport takes the number of shootings as the outcome judgment. In the continuous development of basketball, scholars have found that the athletes' shooting percentage can be predicted. Ozkan found that the athletes will have a certain degree of excitement changes before shooting, and different degrees of excitement changes correspond to different shooting posture. By analyzing the muscle excitement changes of the athletes during the shooting, it can predict the shooting posture that the athletes are about to adopt. Naik and Hashmi et al. found that the landing point of the object could be predicted by the initial 32 Informatica 48 (2024) 17–34 X. Li et al. direction when analyzing the motion trajectory of the object. In basketball, by analyzing the shooting posture of the athletes, people can understand the movement state of the basketball, so as to speculate its landing point and determine whether the ball can be hit accurately. Therefore, the study proposed to use deep learning network to build a human posture recognition algorithm to analyze the movement state of basketball given by athletes and analyze the shooting percentage of athletes. The human posture recognition algorithm designed by the research could reach more than 90% with the accuracy and the recall rate of more than 95% when judging the posture of athletes, showing excellent recognition ability. The accuracy of the shooting prediction model based on the human posture recognition algorithm could reach more than 95%. The accuracy analysis of the shooting rate of the players can help the team coach to arrange tactics in advance. For example, according to the shooting simulation training of the players such as warm-up before the game, first is to analyze the shooting state of the players. Then, the depressed players are temporarily replaced to rest. 5 Conclusion To predict the probability of fixed-point shooting, a prediction model of athletes' fixed-point shooting was constructed based on the YOLOv5 algorithm and OpenPose algorithm, and a prediction system was designed. Results showed that the YOLOv5 algorithm had a significant improvement effect, and the average accuracy of the improved YOLOv5 algorithm reached 95.34% when the number of iterations was 50. Applied to the research dataset, the average recall rates in the four videos of Q, W, E, and R were 97.73%, 96.72%, 95.34%, and 97.98%, respectively, and the detection and tracking effect on athletes was good. In the comparison of improved OpenPose with OpenPose, PoseNet, and the network, improved OpenPose performed better, with a recall rate of 96.23%, an accuracy of 87.16%, a precision of 89.75%, and an F1 value of 88.19%. In the influence analysis of different characteristics on the prediction rate, it is found that the influence rate of the three types of limb features, namely the angle between the shooting arm and the lower arm, the angle between the thigh and the calf, and the angle between the auxiliary arm and the lower arm, was larger, exceeding 6%. The results showed that basketball players need to pay attention to these three characteristics, control the size of the angle, and improve the shooting rate of fixed-point shooting. In comparison with the KNN model, XGBoost model and Bagging model, ROC area was used as the evaluation index, and research model’s AUC value was the largest, which was 0.974. The F1 value, accuracy and recall rate of the research model were the highest, reaching 95.54%, 96.39% and 98.25%, respectively. The study designed system properly and clearly reflected the predicted results. The results verified research model’s superiority, and indicated that the research can provide a useful reference for tactical analysis and player performance evaluation in basketball games. However, there are still shortcomings in this study. The dataset selected for this study comes from the video of the competition, which contains a lot of redundant information and occupies a lot of training time. 6 Ethical compliance The dataset used in the experiment is a publicly available dataset that does not involve athletes' personal privacy and data leakage and has no potential impact on athletes. References [1] A. Rodríguez-Fernández, R. Ramirez-Campillo, J. Raya-González, D. Castillo, and F. Y. Nakamura, “Is physical fitness related with in-game physical performance? A case study through local positioning system in professional basketball players. Proceedings of the Institution of Mechanical Engineers Part P Journal of Sports Engineering and Technology,” vol. 237, no. 3, pp. 188-196, 2023. https://doi.org/10.1177/17543371211031160 [2] P. Soltani, and A. H. Morice, “A multi-scale analysis of basketball throw in virtual reality for tracking perceptual-motor expertise,” Scandinavian Journal of Medicine and Science in Sports, vol. 33, no. 2, pp. 178-188, 2023. https://doi.org/10.1111/sms.14250 [3] Y. Ren, Z. Wang, Y. Wang, S. Tan, Y. Chen, and J. Yang, “GoPose: 3D human pose estimation using WiFi,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 6, no. 2, pp. 1-25, 2022. https://doi.org/10.1145/3534605 [4] W. Liu, Q. Bao, Y. Sun, and T. Mei, “Recent advances of monocular 2D and 3D human pose estimation: A deep learning perspective,” ACM Computing Surveys, vol. 55, no. 4, pp. 1-41, 2022. https://doi.org/10.1145/3524497 [5] L. Lonini, Y. Moon, K. Embry, R. J. Cotton, K. McKenzie, S. Jenz, and A. Jayaraman, “Video-based pose estimation for gait analysis in stroke survivors during clinical assessments: A proof-of-concept study,” Digital Biomarkers, vol. 6, no. 1, pp. 9-18, 2022. https://doi.org/10.1159/000520732 [6] S. Dubey, and M. Dixit, “A comprehensive survey on human pose estimation approaches,” Multimedia Systems, vol. 29, no. 1, pp. 167-195, 2023. https://doi.org/10.1007/s00530-022-00980-0 [7] S. Liu, N. He, C. Wang, H. Yu, and W. Han, “Lightweight human pose estimation algorithm based on polarized self-attention,” Multimedia Systems, vol. 29, no. 1, pp. 197-210, 2023. https://doi.org/10.1007/s00530-022-00981-z Basketball Fixed-point Shooting Hit Prediction Based on Human… Informatica 48 (2024) 17–34 33 [8] X. Qin, H. Guo, C. He, and X. Zhang, “Lightweight human pose estimation: CVC-net,” Multimedia Tools and Applications, vol. 81, no. 13, pp. 17615-17637, 2022. https://doi.org/10.1007/s11042-022-12245-z [9] J. Xu, W. Liu, W. Xing, and X. Wei, “MSPENet: multi-scale adaptive fusion and position enhancement network for human pose estimation,” The Visual Computer, vol. 39, no. 5, pp. 2005-2019, 2023. https://doi.org/10.1007/s00371-022-02460-y [10] H. Jiang, “Application of deep learning method in automatic collection and processing of video surveillance data for basketball sports prediction,” Arabian Journal for Science and Engineering, vol. 48, no. 3, pp. 4111-4112, 2023. https://doi.org/10.1007/s13369-021-05884-1 [11] D. G. Özkan, “Predicting the fate of basketball throws: An EEG study on expert action prediction in wheelchair basketball players,” Experimental Brain Research, vol. 237, no. 12, pp. 3363-3373, 2019. https://doi.org/10.1007/s00221-019-05677-x [12] D. Siemon, and W. Jörn, “Performance prediction of basketball players using automated personality mining with twitter data,” Sport, Business and Management: An International Journal, vol. 13, no. 2, pp. 228-247, 2023. https://doi.org/10.1108/sbm-10-2021-0119 [13] B. T. Naik, and M. F. Hashmi, “LSTM-BEND: Predicting the trajectories of basketball,” IEEE Sensors Letters, vol. 7, no. 4, pp. 1-4, 2023. https://doi.org/110.1109/LSENS.2023.3253863 [14] X. Wang, J. Tong, and R. Wang, “Attention refined network for human pose estimation,” Neural Processing Letters, vol. 53, no. 4, pp. 2853-2872, 2021. https://doi.org/10.1007/s11063-021-10523-9 [15] Y. Liu, and X. Hou, “Fixed-resolution representation network for human pose estimation,” Multimedia Systems, vol. 28, no. 5, pp. 1597-1609, 2022. https://doi.org/10.1007/s00530-022-00919-5 [16] X. Wang, R. Feng, H. Chen, R. Zimmermann, Z. Liu, and H. Liu, “Personalized motion kernel learning for human pose estimation,” International Journal of Intelligent Systems, vol. 37, no. 9, pp. 5859-5879, 2022. https://doi.org/10.1002/int.22817 [17] J. Shi, and S. Kai, “A discrete-time and finite-state markov chain based in-play prediction model for NBA basketball matches,” Communications in Statistics- Simulation and Computation, vol. 50, no. 11, pp. 3768-3776, 2021. https://doi.org/10.1080/03610918.2019.1633351 [18] X. Cong, S. Li, F. Chen, C. Liu, and Y. Meng, “A review of YOLO object detection algorithms based on deep learning,” Frontiers in Computing and Intelligent Systems, vol. 4, no. 2, pp. 17-20, 2023. https://doi.org/10.54097/fcis.v4i2.9730 [19] R. A. Murugan, and B. Sathyabama, “Object detection for night surveillance using ssan dataset based modified yolo algorithm in wireless communication,” Wireless Personal Communications, vol. 128, no. 3, pp. 1813-1826, 2023. https://doi.org/10.1007/s11277-022-10020-9 [20] S. Pastel, J. Marlok, N Bandow, and K. Witte, “Application of eye-tracking systems integrated into immersive virtual reality and possible transfer to the sports sector-A systematic review,” Multimedia Tools and Applications, vol. 82, no. 3, pp. 4181-4208, 2022. https://doi.org/10.1007/s11042-022-13474-y [21] Q. He, X. Li, and W. Li, “Common sports injuries of track and field athletes using cloud computing and internet of things,” International Journal of Computational Intelligence Systems, vol. 16, no. 1, pp. 70, 2023. https://doi.org/10.1007/s44196-023-00257-y [22] S. Velugoti, and M. P. Vani, “An Approach for Privacy Preservation Assisted Secure Cloud Computation,” Infomatica, vol. 47, no. 10, pp. 41-52, 2023. https://doi.org/10.31449/inf.v47i10.4586 [23] J. Fan, X. Yang, R. Lu, W. Li, and Y. Huang, “Long-term visual tracking algorithm for UAVs based on kernel correlation filtering and SURF features,” The Visual Computer, vol. 39, no. 1, pp. 319-333, 2023. https://doi.org/10.1007/s00371-021-02331-y [24] V. Gali, B. C. Babu, R. B. Mutluri, M. Gupta, and S. K. Gupta, “Experimental investigation of harris hawk optimization-based maximum power point tracking algorithm for photovoltaic system under partial shading conditions,” Optimal Control Applications and Methods, vol. 44, no. 2, pp. 577-600, 2023. https://doi.org/10.1002/oca.2773 [25] M. Dunnhofer, A. Furnari, G. M. Farinella, and C. Micheloni, “Visual object tracking in first person vision,” International Journal of Computer Vision, vol. 131, no. 1, pp. 259-283, 2023. https://doi.org/10.1007/s11263-022-01694-6 [26] D. Yang, “Research on multi-target tracking technology based on machine vision,” Applied Nanoscience, vol. 13, no. 4, pp. 2945-2955, 2023. https://doi.org/10.1007/s13204-021-02293-6 [27] J. Zhang, Y. He, W. Feng, J. Wang, and N. N. Xiong, “Learning background-aware and spatial-temporal regularized correlation filters for visual tracking,” Applied Intelligence, vol. 53, no. 7, pp. 7697-7712, 2023. https://doi.org/10.1007/s10489-022-03868-8 [28] K. Aygül, M. Cikan, T. Demirdelen, and M. Tumay, “Butterfly optimization algorithm based maximum 34 Informatica 48 (2024) 17–34 X. Li et al. power point tracking of photovoltaic systems under partial shading condition,” Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, vol. 45, no. 3, pp. 8337-8355, 2023. https://doi.org/10.1080/15567036.2019.1677818 [29] A. Chessa, P. D’Urso, L. De Giovanni, V. Vitale, and A. Gebbia, “Complex networks for community detection of basketball players,” Annals of Operations Research, vol. 325, no. 1, pp. 363-389, 2023. https://doi.org/10.1007/s10479-022-04647-x [30] H. Mokayed, T. Z. Quan, L. Alkhaled, V. Sivakumar, “Real-time human detection and counting system using deep learning computer vision techniques,” Artificial Intelligence and Applications, vol. 1, no. 4, pp. 221-229, 2023. https://doi.org/10.47852/bonviewAIA2202391