https://doi.org/10.31449/inf.v48i10.5960 Informatica 48 (2024) 1 –18 1 Artistic Image Style Conversion Based on Multi-Scale Feature Fusion Network Huizhou Li 1* , Wubin Zhu 2 1 School of Fine Arts and Design, Hefei Normal University, Hefei, 230601, China 2 Zhejiang Uniview Technologies Co., Ltd, Hangzhou, 310051, China E-mail: lihuizhou@hfnu.edu.cn, wubing202403@163.com * Corresponding author Keywords: CNN, MSFF network, AM, artistic images, style conversion Received: March 29, 2024 To enhance the efficiency and quality of artistic image style conversion, this study improves the convolutional neural network style conversion algorithm by introducing a multi-scale feature fusion network, comprehensively considering different convolutional features. Then, combined with attention mechanism, important features of artistic images are extracted. It occupied less conversion time, CPU usage, and memory usage in the artistic image style conversion, with better conversion performance. The research method had high peak signal-to-noise ratio and structural similarity index when converting different artistic styles. The highest peak signal-to-noise ratios for converting to Van Gogh artistic style, Ukiyo-e style, Monet style, and Cézanne style were 22.892, 17.844, 21.647, and 22.291, respectively, and the highest structural similarity index values were 0.842, 0.783, 0.845, and 0.843, respectively. The research has achieved effective conversion of target styles while preserving content in images, improving the quality and effectiveness of artistic image style conversion, and promoting the image processing technology. Povzetek: Študija izboljšuje algoritem za pretvorbo umetniških slik v drugačne stile s pomočjo konvolucijskih globokih nevronskih mrež. 1 Introduction With the progress of artificial intelligence and computer vision technology, artistic image style conversion technology has gradually become a research hotspot. The artistic image style conversion technology aims to convert the style of one image into another, which has broad application prospects [1]. However, existing methods have certain limitations in processing large-scale image data and performing style conversion while preserving content. Traditional methods suffer from style distortion and slow computational speed. Therefore, the research on artistic image style conversion based on Multi-scale Feature Fusion (MSFF) networks is of great significance. The MSFF is a network structure that utilizes deep learning techniques to fuse image features extracted at different scales. By integrating feature information, the accuracy and effectiveness of image processing tasks can be improved, which is suitable for fields such as image style conversion, semantic segmentation, etc [2]. The feature information of various scales is fused, which can retain the content information while approximating the target style information into the generated image, improving the quality and efficiency of style conversion [3]. Therefore, the research aims to use MSFF to achieve more accurate and efficient artistic image style conversion, and improve its quality and efficiency. The innovation lies in the introduced attention mechanism, which help the network focus its attention on more important features. The research provides an effective solution for the development of artistic image style conversion technology, with significant scientific research and practicality. It can promote the progress and application of related technologies. This study has four parts. The first reviews the literature, summarizing the existing results of MSFF networks and artistic image style conversion. The second mainly discusses the improved Convolutional Neural Network (CNN), MSFF network, and the attention mechanism for artistic image style conversion. The third mainly compares the research methods. The last part summarizes the achievements and shortcomings. 2 Related works MSFF network is a crucial and widely used research direction, which integrates feature information at various scales and improve image processing and analysis performance. Zhou et al. built an unsupervised dense network ground on MSFF, and residual modules to address the multi-focus image fusion. It performed better, providing an efficient solution [4]. Deng et al. designed an efficient and lightweight MSFF multi-tasking strategy to address the challenges of cell segmentation and counting. A new up-sampling method, norm combination loss function, and coordinated multi-tasking training discriminator were introduced to achieve non-point-based cell counting and segmentation tasks based on cell count and global segmentation annotations. Compared with 2 Informatica 48 (2024) 1 –18 H. Li et al. traditional methods, the research method had fewer parameters and better performance. The speed increased by nearly ten times [5]. Wang et al. built a method ground on feature fusion and hybrid strategy to address the significant challenge of re-identifying individuals. The ResNet50 backbone was improved and implemented with a deep kernel pooling strategy and a mixed loss function. In three datasets including CUHK3, the research method had higher recognition accuracy, surpassing multiple advanced methods [6]. Wang et al. developed a MSFF network framework to solve the difficult single image crowd counting. This network combined encoder decoder, dense dilated convolutional block, and channel attention mechanism to improve the accuracy of density maps. It was superior to existing methods. The ablation study confirmed the effectiveness of each component [7]. Shen et al. built a hyper-spectral classification strategy ground on a three-dimensional MSFF strategy and channel attention mechanism to address the difficulties of traditional 2D or 3D deep CNNs in hyper-spectral image classification. The proposed method had significant progress in hyper-spectral data classification, solving the challenges of traditional methods in dealing with limited training samples and excessive parameters [8]. Applying algorithms to solve image related problems is an important and widely used method that can achieve functions such as image recognition, processing, and analysis, which has great value and role in fields such as computer vision, medical imaging, and security monitoring. Sun et al. developed a strategy to improve the structure and weight initialization of the deep CNN to solve image classification problems. The variable length gene coding strategy was used to represent network building blocks and depth. The new connection weights were introduced to initialize the representation scheme. It could improve computational efficiency, which was superior to existing designs in terms of classification error rate and weight quantity [9]. Bi et al. designed a genetic performance program with knowledge transfer to address the high computational cost of current large-scale image classification. The new fitness function and set were used to represent the effective image classification set established by the strategy. It could achieve better classification ability in a shorter computation time, which had significant advantages over baseline genetic performance program algorithms and other algorithms [10]. In response to the high time consumption of fractal image compression, Li et al. developed a specific update strategy to improve the computational time in fractal image compression. Experimental results showed that while maintaining image quality, the research method had higher encoding efficiency. It could effectively reduce encoding time [11]. Alkishriwo proposed an adaptive multi-resolution image decomposition strategy to optimize image compression without reducing image quality, which conducted multi-resolution decomposition in different directions. The designed method performed excellently compression ratio, bringing new solutions to the image compression [12]. Tade and Vyas proposed a hybrid depth classifier to classify tone mapped images in various visualization applications. The research method was superior to other image quality evaluation methods. It had the potential to solve color mapping challenge in high dynamic range environments, providing the best quality images for specific visualization applications [13]. The summary of the related works is shown in Table 1. Table 1: Related works summary table Author Main method Key result Limitation Zhou et al. [4] Unsupervised dense networks Extract source image features May require a significant amount of computing resources Deng et al. [5] Efficient and lightweight multi-scale feature fusion multi-task model Fewer parameters, better performance, and nearly ten times faster. Small object detection may not be accurate enough Wang et al. [6] A method based on feature fusion and hybrid strategy Higher recognition accuracy Weak generalization ability Wang et al. [7] Multi-layer feature fusion network framework Improved the accuracy of density maps May lead to over-fitting Shen et al. [8] Hyper-spectral image classification method Remarkable progress in hyper-spectral data classification Unclear applicability of non hyperspectral image data Sun et al. [9] An improved structure for deep convolutional neural networks Improved computational efficiency Maybe too much search space Bi et al. [10] Divide-and-conquer genetic performance program Better classification performance in less computation time Need to design fitness functions Li et al. [11] Specific update search algorithm Higher coding efficiency Image compression may not be ideal for complex textures Alkishriwo [12] Image decomposition Excellent performance in Sensitive to parameter Artistic Image Style Conversion Based on Multi-Scale Feature … Informatica 48 (2024) 1 –18 3 algorithm based on adaptive multi-resolution peak signal-to-noise ratio selection Tade and Vyas [13] A hybrid depth classifier Solved color tone mapping in high dynamic range environments Unable to determine the special classification effect In summary, integrating feature information from different scales can improve image processing and analysis performance. Given the style distortion and slow computational speed of traditional methods for artistic image style conversion, this study utilizes the MSFF network for artistic image style conversion, achieving more accurate image artistic style conversion. 3 Feature extraction and style conversion of artistic images ground on MSFF Network To improve the efficiency and quality of artistic image style conversion, the CNN is first improved to better extract image features. Then, the MSFF network is used to fuse features at various scales to improve the expression ability. The attention mechanism is adopted to fix on the more important features of artistic images. 3.1 Image feature extraction based on improved CNN Image style conversion algorithms ground on deep learning use CNN to extract image features. Then the U-shaped network structure is used for style conversion. The high-level convolutional features of the input content image and the target style image are calculated by the encoder. The style conversion algorithm is combined to form a fused feature map, which is ultimately mapped back to the original pixel space by the decoder to get the target style conversion image [14]. Figure 1 displays the process. Content image Style conversion image Encoder Multi scale feature fusion style transformation algorithm Decoder Style image Figure 1: Artistic image style conversion process In Figure 1, the features extracted by convolution at different levels exhibit different characteristics. As the network depth increases, the extracted overall contour features are more blurred and representative. In view of this, an improved CNN style conversion algorithm has been proposed in the study to address issues such as detail deterioration and local structural distortion caused by input images with complex spatial structures. The algorithm structure includes an encoder, a conversion network, and a decoder. A novel feature detection strategy is used to grasp features with fewer parameters. By decomposing the large convolutional kernels in the conversion network, the parameters are reduced and the conversion speed is improved [15]. The adaptive normalization method is used to process the output of Convolutional Layer (CL) to better preserve the semantic information of content images. This algorithm achieves fast conversion of multiple styles, enhances the structural features, and significantly improves the detail effects. In style conversion, the encoder extracts feature under various CLs. The encoder adopts a pre-trained Visual Geometry Group (VGG) structure, as shown in Figure 2. 4 Informatica 48 (2024) 1 –18 H. Li et al. 3×3 conv1_1, 64 3×3 conv1_2, 64, pool/2 3×3 conv2_1, 128 3×3 conv2_2, 128, pool/2 3×3 conv3_4, 256, pool/2 3×3 conv3_3, 256 3×3 conv3_2, 256 3×3 conv3_1, 256 3×3 conv4_1, 512 Figure 2: Encoder structure In Figure 2, from conv1_1 to conv4_1, all convolution kernels have a size of 3 ×3. Each CL is followed by a Relu activation function. After conv_2, conv2_2, and conv3_4, there is a max pooling layer for down-sampling. In feature fusion, the content feature map and style feature map are output for fusion in the conv2_1, conv3_1, and conv4_1 section to avoid fusing the conv_1 results and prevent affecting the quality of style conversion [16-17]. To reduce parameter calculations, the large convolutional kernel in the CL of the conversion network is decomposed into two 5 ×5 convolutional kernels instead of the 9 ×9 convolutional kernel, keeping the receptive field unchanged and increasing the network depth and learning ability. In the conversion network, an Adaptive Instance Normalization (AdaIN) is introduced to automatically match the feature statistics of content images and style images. The mean and variance of content feature information are aligned with the mean and variance of style feature images to obtain the target feature map h . After AdaIN, the content Loss Function (LF) and Style LF are obtained, as shown in equation (1). ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) , 2 , 22 C AdaIN S AdaIN L d g h h L G y G y G y G y      =−   = − + −   (1) In equation (1), , C AdaIN L and , S AdaIN L are the content LF and style LF, respectively [18]. The overall content perception LF and style perception LF for image style conversion are displayed in equation (2). ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 2 2 11 2 2 1 2 1 , , Ll l l l l C l l l ll ll L ll S F ll l L x y M x M y d y h H W N G y G y L y y G y G y G y G y   == =  = − + −       −    = − +     +−        (2) In equation (2), ( ) , C L x y and ( ) , S L y y represent the overall content perception LF and style perception LF of image style conversion, respectively. Therefore, the total LF of the entire network training is obtained. It is trained and optimized by the random gradient descent method, as shown in equation (3). C S R L L L L    = + + (3) In equation (3),  and  represent the weight values of content loss and style loss, respectively. R L is the regularization term.  represents the weight value of R L [19]. Afterwards, the decoder parameters can be derived through the LF. After multiple training, the optimal decoder parameters can be obtained. The style conversion network training and CNN improvement are completed to extract image features. The improved CNN architecture aims to improve the extraction efficiency and quality of image features, as shown in Figure 3. Encoder Conversion network Decoder Pre-trained VGG Adaptive normalization Original image Target style avatar Figure 3: Improved CNN architecture Artistic Image Style Conversion Based on Multi-Scale Feature … Informatica 48 (2024) 1 –18 5 The architecture includes three main parts: encoder, conversion network and decoder. The encoder uses the pre-trained VGG network structure to extract image features through multiple convolution layers and pooling layers. The conversion network reduces the number of parameters by reducing large convolution kernels, and optimizes the feature representation by adaptive normalization methods. The decoder is responsible for mapping the fused features back into the pixel space to generate the target style image. 3.2 Feature fusion based on the MSFF network The features extracted by convolutional networks at different levels have different effects. Low level convolution can grasp the detailed information, which is beneficial for expressing local features. High level convolution focuses more on the overall structural features of the image, such as shape and contour. The existing image style conversion algorithms mainly focus on converting high-level features into images. Although this can better express overall features, it may not achieve satisfactory results in terms of local details [20]. Accordingly, the MSFF network is introduced. Taking into account different levels of convolutional features comprehensively, the decoder takes into account both low and high information during the image generation to obtain more satisfactory detail results. The artistic image style conversion based on MSFF network first extracts content and style image features by the encoder, and merges feature maps through MSFF. Finally, the target image is generated through the decoder. The core of MSFF network is to integrate features of different scales to enhance feature expression ability. The network uses convolution kernels with different sizes to extract features in parallel through the multi-scale feature extraction layer, and then concatenation operations are carried out through the MSFF layer to integrate features. The dimensionality reduction layer is used to reduce the number of channels in the fusion layer to avoid dimensional disasters. The MSFF is displayed in Figure 4. Previous layer input 1×1 Convolutional kernel 3×3 Convolutional kernel 5×5 Convolutional kernel 7×7 Convolutional kernel 9×9 Convolutional kernel Concatenation layer 1×1 Convolutional kernel Figure 4: MSFF module structure From Figure 4, the MSFF module aims to extract features at various scales in the image and fuse them, including a MSFF feature extraction layer, a MSFF layer, and a dimensionality reduction layer. The multi-scale feature extraction layer extracts feature at various scales through multiple convolution kernels of various sizes, among which the 1x1 convolution kernel is used to preserve shallow information to improve the image quality. When selecting the feature extraction scales, a comprehensive consideration should be given to network parameters and over-fitting. The core function of the MSFF module is to extract image features at various scales and integrate these features together [21]. Multiple convolutional kernels with different sizes can extract features from different scales and perform nonlinear representations in the fusion layer. When determining the feature extraction scales, network parameters and over-fitting need to be balanced to achieve the best results. After each CL, the nonlinear mapping ability is enhanced through nonlinear layers. The input of all multi-scale feature extraction layers is X , and there are m CLs in this layer. Different layers have different convolution kernel sizes. Equation (4) represents the i -th CL in the first MSFF module. ( ) ( ) 1 1 1 1 i i i i f X W X B  =  + (4) In equation (4), 1 i W and 1 i B are the weights and biases of the CL, respectively. * refers to the convolution operation. 1 i  is the nonlinear element after the i -th CL, which presented in equation (5). ( ) ( ) 1 max 0, i xx  = (5) 6 Informatica 48 (2024) 1 –18 H. Li et al. In equation (5), x stands for the input value of the nonlinear element. The MSFF layer is to fuse the feature maps output by multiple scale feature extraction layers to supply the next layer for processing. This layer consists of concatenation operations, which overlay feature maps of various scales and channels together [22]. The channels in the fused feature map are equal to the total channels in each CL of the MSFF extraction layer. The fusion principle is displayed in Figure 5. Concatenation layer Figure 5: Fusion of MSFF layers From Figure 5, the MSFF layer mainly integrates three different types of features. Assuming that the multi-scale feature extraction layer has m CLs, the MSFF layer in the first MSFF module is displayed in equation (6). ( ) ( ) ( ) 1 1 1 1 1 11 i i i i mm ii f X f X W X B  == = =  +  (6) In equation (6), X refers to the input value of the multi-scale feature extraction layer. The dimensionality reduction layer is to reduce the MSFF layer ’s dimensionality, that is, to reduce its channel count. In a multi-feature extraction layer, each scale CL typically requires a certain number of convolution kernels, as different convolution kernels can extract different features. Although there are many channels in the CL at each scale, it may not cause dimensionality issues when used in cascading. However, after parallel use and fusion, the channels in the MSFF layer increase sharply, which may cause dimensional disasters and limit the network size [23]. Therefore, before entering the next multi-feature fusion module, the MSFF layer is dimensionally reduced to reduce the channels and facilitate feature fusion into the next module. The dimensionality reduction layer has a CL with a kernel size of 1 ×1 and a nonlinear activation unit. The channels in this CL are less than the channels in the multi-feature fusion layer. The 1 ×1 convolution kernel can retain all information in the multi-feature fusion layer, while reducing the channels in the final output multi-scale feature map, playing a dimensionality reduction role. The dimensionality reduction layer of the first MSFF module is displayed in equation (7). ( ) ( ) 11 11 11 mm F W f X B  ++ =  + (7) In equation (7), ( ) 1 fX stands for the output value of the multi-feature fusion layer. 1 1 m W + and 1 1 m B + stand for the weights and biases of the CL in the dimensionality reduction layer, respectively.  is used to describe the operation of non-linear activation units in the dimensionality reduction layer. The MSFF module utilizes convolution kernels of various sizes to extract multi-scale features of images. Multiple filter sets with various sizes extract and fuse multi-scale information from images [24]. After each CL, nonlinear activation units are introduced to learn the nonlinear mapping relationship between input and class labels. The dimensionality reduction operation avoids the curse of dimensionality caused by the increase in the channels in the MSFF layer and the limitation on network size, allowing modules to be used in multi-level concatenation. It is suitable for MSFF network artistic image style conversion tasks. The MSFF module performs better when used in cascading. The first and l -th multi-scale feature modules are shown in equation (8). Artistic Image Style Conversion Based on Multi-Scale Feature … Informatica 48 (2024) 1 –18 7 ( ) ( ) 11 11 1 1 1 1 1 1 1 1 * * m i i i m m i i i m m l i m ll l l l l l i F W W X B B F W W F B B   ++ ++ = − =  =  + +        =  + +       (8) In equation (8), 1 F and l F represent the first and l -th multi-scale feature modules, respectively. 1 l F − stands for the output value of the previous MSFF module. 1 m l W + and 1 m l B + represent the weights and biases of the CL in the dimensionality reduction layer, respectively. The MSFF network includes multiple cascaded multi-feature fusion modules and a CL with a kernel size of 3 ×3, mainly achieving artistic image style conversion. These modules map the multi-scale features of one artistic image style to another artistic image style. Finally, a multi-scale feature of an artistic image style is transformed into the desired artistic image through 3 ×3 CLs [25-26]. Assuming that L MSFF modules and a reconstruction layer (i.e. convolutional layer) are used in the network, the mathematical expressions of the first L modules are shown in equation (9). ( ) ( ) 11 1 ,1 , 2.. l l l f X F X l f X F f l L −  ==   ==   (9) In equation (9), is used to describe the set of feature extraction, representation, and dimensionality reduction operations of the MSFF module on the input. The final reconstruction layer, also known as the CL, is responsible for fusing features at different scales together, as shown in equation (10). ( ) ( ) 11 * L LL F X W f X B ++ =+ (10) In equation (10), ( ) L fX stand for the output of the L -th MSFF module in the network, thus achieving the artistic image style conversion of the MSFF network. The overall structure of the Improved CNN-Multi-Scale Feature Fusion Network (ICNN-MFFN) is presented in Figure 6. Style conversion image VGG Encoder conv 1 conv 2 conv 3 conv 4 conv 1 conv 2 conv 3 conv 4 Decoder Figure 6: MSFF network structure From Figure 6, the MSFF network structure utilizes feature information from different scales. An appropriate fusion strategy effectively combines this feature information into a network structure. The features at different scales are fused, which can better generalize targets, thereby more effectively achieving tasks including image segmentation and object detection. This network structure can effectively integrate information from different scales in images, improving the performance and robustness. 8 Informatica 48 (2024) 1 –18 H. Li et al. 3.3 Artistic image style conversion based on introduced attention mechanism The visual attention mechanism is an important method for humans to obtain key information. In complex scenes, humans prioritize capturing the target area and concentrate their attention to obtain more detailed information. This mechanism helps humans suppress useless information and quickly obtain information on key areas. The attention mechanism in computer vision is comparable to humans, focusing on key regions in images [27]. The visual attention mechanism based on deep learning is implemented through a mask mechanism, using weights to mark important features of the image, and forming attention through neural network learning. Soft attention focuses on regions or feature channels, and obtains attention weights through neural network learning, while strong attention focuses on pixel level details. Each pixel may generate attention, which is typically achieved through reinforcement learning. The channel attention mechanism in CNNs is used to measure the importance of each feature channel, stimulate important channel information, suppress useless channel information, and highlight key areas in the image. This mechanism can improve the deep CNNs [28]. An Efficient Channel Attention Network (ECA) network is proposed. The ECA is displayed in Figure 7.  Figure 7: ECA network From Figure 7, the ECA network emphasizes the importance of direct correspondence between channels and weights by independently learning the weights of each channel, while avoiding dimensionality reduction operations. Meanwhile, by designing a one-dimensional convolutional kernel with adaptive size selection, cross channel information exchange is achieved, which improves the effectiveness of channel attention and ensures both performance and model efficiency [29-30]. The channel attention module first performs a compression operation, as shown in equation (11). ( ) 11 1 , TN c ij z u i j TN == =   (11) In equation (11), z represents the compressed feature map. ( ) uc represents the spatial graph convolution output data. Excitation conversion is performed on the feature graph, as shown in equation (12). ( ) ( ) 1 z s W W z  = (12) In equation (12), s represents the converted feature map.  represents the Sigmoid activation function.  stands for the Re Lu activation function. The ICNN-MFFN-Attention Mechanism (ICNN-MFFN-AM) is designed. Figure 8 displays the structure. Artistic Image Style Conversion Based on Multi-Scale Feature … Informatica 48 (2024) 1 –18 9 c x y x Conv, Stride-1 Conv, Stride-2 Conv, Stride-3 Image conversion network ECA Fire module Conv, Stride-1 Conv, Stride-2 Conv, Stride-3 y Conv1 Conv2 Conv3 Conv4 ... content l style l Figure 8: Lightweight image style conversion algorithm with attention mechanism From Figure 8, the network structure of the algorithm includes an image conversion network and a content and style representation network. The image conversion network has an encoder, an ECA-Fire module, and a decoder. The encoder-decoder structure is applied to reduce computational complexity and increase receptive field. The main body is composed of multiple ECA-Fire modules. The content and style representation network is a pre-trained VGG-16 network used to grasp content and style features of images, and define content loss and style loss. During the training, based on pre-selected images read from the dataset, they are input into the network to calculate content and style loss. The image conversion network parameters are updated through backpropagation. A lightweight style transfer model with a specific style is ultimately generated [31]. The image conversion network consists of five ECA-Fire modules, with residual connections used between the first and second modules, as well as between the fourth and fifth modules. The network uses a abundant small convolution kernels with sizes of 3*3 or 1*1, while the first and last layers use 9*9 convolution kernels. The input is a color 3-channel content image with a resolution of 256*256. Down-sampling is achieved through a CL with a stride size of 2. The corresponding up-sampling is achieved through a CL with a stride size of 1/2 to adjust the channels and resolution of the output image to match the input image. This operation reduces computational complexity, which can effectively increase the size of the receptive field. Down-sampling can conveniently use larger CLs for feature extraction, and the increase in effective receptive field also helps to improve the quality of image style conversion. The content and style representation network is essentially a pre-trained VGG-16 network used to grasp features from content images, converted images, and style images. These three types of images are input into the network and their activation responses in a certain layer of the network are extracted, which are called feature maps [32]. Content loss is not an accurate pixel level loss, but the mean square error in the feature maps extracted from the converted image and the content image in the network, representing their content similarity. After the CLs in the content and style representation network, the feature map size is represented as j j j C H W  . j C is the channels, j H is the height, and j W is the width. The LS is defined as the mean square error between the features of the content image and the features of the converted image, as expressed in equation (13). ( ) ( ) ( ) 2 2 1 , content c j c j j j j l x y x y H W C  =−  (13) 10 Informatica 48 (2024) 1 –18 H. Li et al. In equation (13), content l represents the CLF. c x and y are content images and conversion images. j  represents the j -th CL of the content and style network  . Style loss is applied to constrain the distinctions in the converted image y and the style image s x , aiming to preserve style features such as color, texture, and common patterns. The Gram matrix is defined to represent the style information of the feature map, as expressed in equation (14). ( ) ( ) ( ) '' , , , , , 11 1 jj HW j j j c c h w c h w c hw j j j G x x x H W C   == =   (14) In equation (14), ( ) j x  represents a C -dimensional vector. All elements are composed of a set of feature maps jj HW  to form a row vector. It meas that ( ) j x  is converted into a two-dimensional vector  of j j j C H W  , and then solved with its transposed inner product to obtain the Gram matrix. The diagonal elements of the Gram matrix represent the feature map information itself. Other elements represent the correlation information between different feature maps, which can be used to measure the importance of features within themselves and between different features. The square of the difference norm in the Gram matrices of the converted image y and the style image s x is calculated. The differences calculated at each layer are added to obtain the final style loss, as expressed in equation (15). ( ) ( ) ( ) 2 , jj style s s F l x y G x G y  =− (15) The total LS is composed of a linear combination of content loss and style loss, expressed as equation (16). ( ) ( ) ( ) 12 , , , , total c s content c style s l x x y l x y l x y  =+ (16) In equation (16), 1  and 2  represent the weight coefficients of content loss and style loss, respectively. The total LS is iteratively optimized, aiming to minimize the total loss value and ultimately generate a lightweight style conversion model with a specific style. The stylized images generated by this model are comparable in quality to other models, but have advantages in terms of size and speed, making it more convenient to achieve real-time image style conversion. 4 Analysis of artistic image style conversion based on MSFF network To analyze the effect of the research method on artistic image style conversion, it is first compared with other advanced methods and applied to artistic image style conversion. The research method performs well in various artistic image style conversions, with good visualization results. 4.1 Algorithm Performance Analysis To analyze the artistic image style conversion effect of the research method, the performance of ICNN-MFFN-AM is first compared. It is compared with ICNN-MFFN and Cycle-Consistent Generative Adversarial Networks (CycleGAN). Among them, CycleGAN can achieve image conversion of different styles through adversarial training between two generators and two discriminators. The experimental environment for all algorithms is consistent, as displayed in Table 2. Table 2: Experimental environment Number Software and hardware projects Specific information (1) CPU 12th Gen Intel(R) Core(TM) i5-12400F (2) GPU NVIDA GeForce RTX 3060 Ti (3) RAM 12G (4) Operating system Windows 10 64 bit operating system (5) Anaconda version 4.6.11 (6) Hard disk 1TB (7) CUDA version 9.0 (8) Python version 3.7.3 The experiment uses a memory hardware parameter of 12GB DDR4 RAM, which ensures the memory requirement when processing large-scale image datasets. The GPU hardware model is NVIDIA GeForce RTX 3060 Ti, which provides efficient parallel computing power to accelerate the training and reasoning process of deep learning models. The Anaconda version 4.6.11 is used to manage the Python environment and dependency packages, with Python version 3.7.3 as the basis for programming language and scripting, and PyTorch 0.4.0 deep learning framework for building and training CNN models. The Content image dataset is MSCOCO 2017, which is widely used for computer vision tasks that contains 118,287 images of everyday life scenes. The Artistic Image Style Conversion Based on Multi-Scale Feature … Informatica 48 (2024) 1 –18 11 Style Image dataset of WikiArt is adopted. WikiArt is a dataset containing many artistic style images, downloaded from Kaggle, with a quantity of approximately 80000. The preprocessing steps ensure the consistency and availability of the dataset, ensure that all images are in RGB format, adjust the size uniformly to 256x256 pixels, remove damaged or inconsistent images, establish a pair of content images and style images, and ensure that there are enough samples for style conversion training. A two-sample t test is performed on the experimental results to verify whether the performance difference between the proposed method and the existing method is statistically significant. 95% confidence intervals are calculated to evaluate the reliability of the experimental results. To ensure the reliability and consistency of the experimental results, a fixed random seed is used in the experiment to repeat the random initialization process. The experiment is repeated for many times under the same conditions, and the mean value and standard deviation of the results are calculated. In the comparative experiment, all algorithms have the same backbone network VGG-16. Four GPU servers are used to compute nodes. Each node processes one type of image. The environment configuration of each node is consistent. The usage and loss of each node are different, resulting in differences in model training time. To control variables, the detection speed of the training model is tested at the same computing node to ensure the rigor. 300 images are randomly selected for testing. The batch size is 2. The learning rate is 1 ×10-4, with a total of 50000 iterations. The time for converting images using three methods is shown in Figure 9. 0 40 80 120 160 200 Time/s 0 20 40 60 80 Number of images/×10 2 piece 100 CycleGAN ICNN-MFFN ICNN-MFFN-AM Figure 9: Artistic image style conversion time From Figure 9, as the images increased, the conversion time of the three algorithms also has increased. However, the conversion time of ICNN-MFFN-AM was significantly shorter than that of CycleGAN and ICNN-MFFN. When converting 8000 images, the conversion time of ICNN-MFFN-AM, CycleGAN, and ICNN-MFFN was 61.2s, 118.5s, and 137.6s, respectively. ICNN-MFFN-AM had higher efficiency in artistic image style conversion. The CPU and memory usage during the artistic image style conversion process using the three algorithms are shown in Figure 10. 12 Informatica 48 (2024) 1 –18 H. Li et al. 100 60 40 30 10 0 CPU usage/% Number of images/×10 2 piece 20 40 60 (a) CPU usage 80 0 100 80 Memory usage/% Number of images/×10 2 piece 20 (b) Memory usage 60 100 40 80 20 50 70 90 100 60 40 30 10 0 80 20 50 70 90 CycleGAN ICNN-MFFN ICNN-MFFN-AM 0 CycleGAN ICNN-MFFN ICNN-MFFN-AM Figure 10: CPU usage and memory usage during the artistic image style conversion Figures 10 (a) and 10 (b) respectively show the CPU and memory usage for converting image artistic styles. From Figure 10, when there were more converted images, the CPU and memory usage also increased and gradually stabilized. Overall, ICNN-MFFN-AM had lower CPU and memory usage, which meant that ICNN-MFFN-AM had the best artistic image style conversion performance. 4.2 The quality conversion results of different artistic image styles Artistic image style conversion is a digital image processing technique. It uses computer vision and deep learning algorithms to re-render an image (called a "content image") with the artistic style of another image (called a "style image"). The effects of converting ordinary photos into Van Gogh style, Ukiyo-e style, Monet style, and C é zanne style are compared. The Structural Similarity Index (SSI) and Peak Signal-to-Noise Ratio (PSNR) are used to assess the similarity and distortion in the generated and the source domain image. Firstly, an ablation experiment is conducted on Van Gogh's artistic style conversion. The results are shown in Figure 11. 0 5 15 20 25 SSI value 0 2 4 6 8 10 PSNR (a) Picture A 10 0 5 15 20 25 SSI value 0 2 4 6 8 10 PSNR (b) Picture B 10 CycleGAN ICNN-MFFN ICNN-MFFN-AM CycleGAN ICNN-MFFN ICNN-MFFN-AM 0 5 15 20 25 SSI value 0 2 4 6 8 10 PSNR (c) Picture C 10 0 5 15 20 25 SSI value 0 2 4 6 8 10 PSNR (d) Picture D 10 CycleGAN ICNN-MFFN ICNN-MFFN-AM CycleGAN ICNN-MFFN ICNN-MFFN-AM Figure 11: The result of Van Gogh style conversion Artistic Image Style Conversion Based on Multi-Scale Feature … Informatica 48 (2024) 1 –18 13 Figures 11 (a), 11 (b), 11 (c), and 11 (d) represent the average PSNR and SSI values of converting images A, B, C, and D into Van Gogh artistic style images using three algorithms 50 times, respectively. PSNR evaluates the distortion by comparing the ratio between mean square error and the maximum pixel value, with higher values indicating better image quality. SSI is based on similarities in brightness, contrast, and structure. The larger values indicating that the generated image is closer to the source domain image. From Figure 11, the PSNR and SSI values of the three algorithms were ICNN-MFFN-AM, CycleGAN, and ICNN-MFFN in descending order. ICNN-MFFN-AM preserved high quality of the content images in converting the four content images into Van Gogh artistic style images. The error was smaller, which could better reflect the subjective evaluation of image quality, while retaining the structural information. The style conversion of Ukiyo-e is shown in Figure 12. 15.117 16.658 14.993 15.178 15.672 14.631 14.759 15.764 16.059 17.884 16.644 15.985 12 13 14 15 16 17 18 19 20 CycleGAN ICNN-MFFN ICNN-MFFN-AM Picture A Picture B Picture C Picture D (a) PSNR PSNR value 0.647 0.654 0.648 0.730 0.695 0.635 0.583 0.702 0.752 0.726 0.709 0.783 0.4 0.5 0.6 0.7 0.8 0.9 1.0 CycleGAN ICNN-MFFN ICNN-MFFN-AM Picture A Picture B Picture C Picture D (b) SSI SSI value Figure 12: Result of Ukiyo-e artistic style conversion Figures 12 (a) and 12 (b) respectively represent the PSNR and SSI values of three algorithms for converting four content images into Ukiyo-e style images. From Figure 12, the PSNR values of the four content images converted by ICNN-MFFN-AM were 16.259, 17.884, 16.644 and 15.985, respectively. The standard deviation of the PSNR values for ICNN-MFFN-AM was 0.131. The SSI values were 0.752, 0.756, 0.709 and 0.783, respectively, and the standard deviation of SSI values of ICNN-MFFN-AM was 0.193. The PSNR and SSI values for ICNN-MFFN-AM were significantly higher than those of CycleGAN and ICNN-MFFN, and the standard deviations were lower than those of CycleGAN and ICNN-MFFN. The result of Monet style conversion is shown in Figure 13. 14 Informatica 48 (2024) 1 –18 H. Li et al. 0 5 15 20 25 SSI value 0 2 4 6 8 10 PSNR (a) Picture A 10 0 5 15 20 25 SSI value 0 2 4 6 8 10 PSNR (b) Picture B 10 CycleGAN ICNN-MFFN ICNN-MFFN-AM CycleGAN ICNN-MFFN ICNN-MFFN-AM 0 5 15 20 25 SSI value 0 2 4 6 8 10 PSNR (c) Picture C 10 0 5 15 20 25 SSI value 0 2 4 6 8 10 PSNR (d) Picture D 10 CycleGAN ICNN-MFFN ICNN-MFFN-AM CycleGAN ICNN-MFFN ICNN-MFFN-AM Figure 13: Monet style conversion results Figures 13 (a), 13 (b), 13 (c), and 13 (d) present the average PSNR and SSI values of images A, B, C, and D converted into Monet artistic style images using three algorithms 50 times, respectively. From Figure 13, compared with CycleGAN and ICNN-MFFN, ICNN-MFFN-AM also had higher PSNR and SSI values, indicating that CycleGAN retained more information. It had smaller errors when converting content images into Monet artistic style images. The result of the C ézanne style conversion is shown in Figure 14. 20.641 21.331 17.662 18.606 18.118 19.844 15.893 16.643 21.058 22.291 18.203 19.154 15 16 17 18 19 20 21 22 23 24 25 0.715 0.801 0.656 0.681 0.682 0.714 0.617 0.701 0.784 0.843 0.718 0.728 0.5 0.6 0.6 0.7 0.7 0.8 0.8 0.9 0.9 1.0 1.0 2 CycleGAN ICNN-MFFN ICNN-MFFN-AM Picture A Picture B Picture C Picture D (a) PSNR PSNR value SSI value Picture A Picture B Picture C Picture D (b) SSI CycleGAN ICNN-MFFN ICNN-MFFN-AM Figure 14 Cézanne style conversion results Figures 14 (a) and 14 (b) respectively represent the PSNR and SSI values of three algorithms for converting four content images into C ézanne style images. From Figure 14, the PSNR values of image B converted by ICNN-MFFN-AM, CycleGAN and ICNN-MFFN were 22.291 (P<0.05), 21.331 (P<0.05) and 19.844 (P<0.05), respectively. The standard deviations of PSNR values for ICNN-MFFN-AM, CycleGAN and ICNN-MFFN were Artistic Image Style Conversion Based on Multi-Scale Feature … Informatica 48 (2024) 1 –18 15 0.122, 0.231 and 0.063, respectively. The SSI values of ICNN-MFFN-AM, CycleGAN and ICNN-MFFN were 0.843 (P<0.05), 0.801 (P<0.05) and 0.714 (P<0.05), respectively. The standard deviations of SSI values for ICNN-MFFN-AM, CycleGAN and ICNN-MFFN were 0.041, 0.063 and 0.084, respectively. ICNN-MFFN-AM was more likely to inherit the color and texture information of the source image when converting styles, achieving the best image conversion effect. The above research results indicate that ICNN-MFFN-AM has superior image conversion performance and significant advantages in image conversion tasks. The visualization results of ICNN-MFFN-AM converting four content images into Van Gogh style, Ukiyo-e style, Monet style, and C ézanne style are shown in Figure 15. Figure 15: Visualization results of artistic image style conversion Figures 15 (a), 15 (b), 15 (c), and 15 (d) present the results of ICNN-MFFN-AM converting four content images into Van Gogh style, Ukiyo-e style, Monet style, and C é zanne style. From Figure 15, the ICNN-MFFN-AM could naturally convert content images into different artistic styles while retaining the original content. 5 Conclusion The artistic image style conversion technology aims to convert the style of one image into another image, which has broad application prospects. However, existing methods have certain limitations in processing large-scale image data and performing style conversion while preserving content. This study was based on the MSFF network and attention mechanism to convert artistic image styles. The main contributions of the research included an improved CNN structure, which enhanced the feature extraction capability through adaptive normalization and multi-scale feature fusion technology. An efficient channel attention mechanism was introduced, which enabled the network to focus more on the key features of the image, and improved the naturalness and accuracy of style conversion. The results showed that when converting 8000 images, the conversion time of ICNN-MFFN-AM, CycleGAN, and ICNN-MFFN was 61.2s, 118.5s, and 137.6s, respectively. The ICNN-MFFN-AM had higher efficiency in artistic image style conversion, and lower CPU and memory usage. The PSNR and SSI values of the three algorithms in descending order were ICNN-MFFN-AM, CycleGAN, and ICNN-MFFN. After converting content images into different artistic styles, ICNN-MFFN-AM could naturally convert them into the required artistic style while retaining the original content. The artistic image style conversion method based on MSFF network proposed in the study has achieved significant improvements in image conversion quality and speed. In the future research, from the perspective of practical application, the research will study how to combine user interaction and allow users to guide the style conversion process to generate an image style that is more in line with user expectations. 6 Discussion The proposed artistic image style conversion realizes efficient style conversion of content images through multi-scale feature fusion network and attention mechanism. First of all, compared with the unsupervised dense network proposed by Zhou et al. [4], the research method is more refined in feature extraction. The introduced attention mechanism pays more attention to the important features of artistic images, so as to better retain the details of content images in the style conversion. Compared with the traditional CNN style conversion algorithm, this study significantly improves the detail effect and overall quality of style conversion by improving the CNN structure, and using adaptive normalization and multi-scale feature fusion techniques. In addition, by reducing the large convolution kernel in the conversion network, the number of parameters is 16 Informatica 48 (2024) 1 –18 H. Li et al. reduced, the conversion speed is increased, and the slow operation speed in the traditional method is solved. Compared with advanced methods such as CycleGAN, the research method showed lower consumption in conversion time, CPU usage, and memory usage, indicating higher efficiency. Thanks to the design of the multi-scale feature fusion network, it allows the network to consider local details and overall structure at different levels simultaneously, thus achieving a better balance in the task of artistic image style conversion. The multi-scale feature fusion network combined with attention mechanism provides a new perspective in the artistic image style conversion. This combination not only improves the quality and efficiency of style conversion, but also achieves a deeper understanding and expression of artistic style through more detailed feature extraction and fusion. 7 Fundings The research is supported by: The Philosophy and Social Science Research Project in Universities of Anhui Province in 2023, (No. 2023AH040158); Horizontal scientific research project of Hefei Normal University in 2023, (No. HXXM2023018); The Project of Supporting Outstanding Young Talents in Universities of Anhui Province in 2019, (No. gxyq2019065). References [1] Li Dong, Zheng Liang, and Yue Wang. Graph convolutional network-based image matting algorithm for computer vision applications. IET image processing, 16(10):2817-2825, 2022. https://doi.org/10.1049/ipr2.12528 [2] Gangming Zhao, Kongming Liang, Chengwei Pan, Fandong Zhang, Xianpeng Wu, Xinyang Hu, and Yizhou Yu. Graph convolution based cross-network multiscale feature fusion for deep vessel segmentation. IEEE transactions on medical imaging, 42(1):183-195, 2023. https://doi.org/10.1109/TMI.2022.3207093 [3] Hao-Hsiang Yang, Kuan-Chih Huang, and Wei-Ting Chen. LAFFNet: A lightweight adaptive feature fusion network for underwater image enhancement. IET image processing, 15(3):774-785, 2021. https://doi.org/10.48550/arXiv.2105.01299 [4] Ding Zhou, Xin Jin, Qian Jiang, Li Cai, Shin-jye Lee, and Shaowen Yao. MCRD-Net: An unsupervised dense network with multi-scale convolutional block attention for multi-focus image fusion. IET image processing, 16(6):1558-1574, 2022. https://doi.org/10.1049/ipr2.12430 [5] Lijia Deng, Shui-Hua Wang, and Yu-Dong Zhang. ELMGAN: A GAN-based efficient lightweight multi-scale-feature-fusion multi-task model. Knowledge-based systems, 252:109434.1-109434.12, 2022. https://doi.org/10.1016/j.knosys.2022.109434 [6] Yongjie Wang, Wei Zhang, and Yanyan Liu. Multi-scale feature fusion network for person re-identification. IET image processing, 14(17):4614-4620, 2020. https://doi.org/10.1049/iet-ipr.2020.0008 [7] Luyang Wang, Yun Li, Sifan Peng, Xiao Tang, and Baoqun Yin. Multi-level feature fusion network for crowd counting. IET computer vision, 15(1):60-72, 2021. https://doi.org/10.1049/cvi2.12012 [8] Jinyue Shen, Zhouzhou Zheng, Yingwei Sun, Mengmeng Zhao, Yankang Chang, Yuyi Shao, and Yan Zhang. HAMNet: Hyperspectral image classification based on hybrid neural network with attention mechanism and multi-scale feature fusion. International journal of remote sensing, 43(11/12):4233-4258, 2022. https://doi.org/10.1080/01431161.2022.2109222 [9] Yanan Sun, Bing Xue, Mengjie Zhang, and Gary G. Yen. Evolving deep convolutional neural networks for image classification. IEEE transactions on evolutionary computation, 24(2):394-407, 2020. https://doi.org/10.1109/TEVC.2019.2916183 [10] Ying Bi, Bing Xue, and Mengjie Zhang. A divide-and-conquer genetic programming algorithm with ensembles for image classification. IEEE transactions on evolutionary computation, 25(6):1148-1162, 2021. https://doi.org/10.1109/TEVC.2021.3082112 [11] Yunping Zheng, Xiangpeng Li, and Mudar Sarem. Fast fractal image compression algorithm using specific update search. IET image processing, 14(9):1733-1739, 2020. https://doi.org/10.1049/iet-ipr.2019.0522 [12] Osama A.S. Alkishriwo. Image compression using adaptive multiresolution image decomposition algorithm. IET image processing, 14(14):3572-3578, 2020. https://doi.org/10.1049/iet-ipr.2019.1699 [13] Sunil L. Tade, and Vibha Vyas. Hybrid deep emperor penguin classifier algorithm-based image quality assessment for visualisation application in HDR environments. IET image processing, 14(11):2579-2587, 2020. https://doi.org/10.1049/iet-ipr.2019.1371 [14] Jiawei Yuan, Hai-Lin Liu, Yew-Soon Ong, and Zhaoshui He. Indicator-based evolutionary algorithm for solving constrained multi-objective optimization problems. IEEE transactions on evolutionary computation, 26(2):379-391, 2022. https://doi.org/10.1109/TEVC.2021.3089155 [15] Guilherme Paim, Hussam Amrouch, Leandro M. G. Rocha, Brunno Abreu, Eduardo Antônio César da Costa, Sergio Bampi, Jörg Henkel. A framework for crossing temperature-induced timing errors underlying hardware accelerators to the algorithm and application layers. IEEE transactions on computers, 71(2):349-363, 2022. https://doi.org/10.1109/TC.2021.3050978 [16] Zhenshou Song, Handing Wang, Cheng He, and Artistic Image Style Conversion Based on Multi-Scale Feature … Informatica 48 (2024) 1 –18 17 Yaochu Jin. A kriging-assisted two-archive evolutionary algorithm for expensive many-objective optimization. IEEE transactions on evolutionary computation, 25(6):1013-1027, 2021. https://doi.org/10.1109/TEVC.2021.3073648 [17] Dawei Zhan, and Huanlai Xing. A fast kriging-assisted evolutionary algorithm based on incremental learning. IEEE transactions on evolutionary computation, 5(5):941-955, 2021. https://doi.org/10.1109/TEVC.2021.3067015 [18] Abhinav Tomar, Lalatendu Muduli, and Prasanta K. Jana. A fuzzy logic-based on-demand charging algorithm for wireless rechargeable sensor networks with multiple chargers. IEEE transactions on mobile computing, 20(9):2715-2727, 2021. https://doi.org/10.1109/TMC.2020.2990419 [19] Jian-Yu Li, Zhi-Hui Zhan, Hua Wang, and Jun Zhang. Data-driven evolutionary algorithm with perturbation-based ensemble surrogates. IEEE transactions on cybernetics, 51(8):3925-3937, 2021. https://doi.org/10.1109/TCYB.2020.3008280. [20] Hanbo Zheng, Yonghui Sun, Xinghua Liu, Calvin Laurent Tcheteu Djike, Jinheng Li, Yang Liu, Jianchao Ma, Kai Xu, and Chaohai Zhang. Infrared image detection of substation insulators using an improved fusion single shot multibox detector. IEEE transactions on power delivery, 36(6):3351-3359, 2021. https://doi.org/10.1109/TPWRD.2020.3038880 [21] Zi-Han Zhang, Xiao-Jun Wu, and Tianyang Xu. FPNFuse: A lightweight feature pyramid network for infrared and visible image fusion. IET image processing, 16(9):2308-2320, 2022. https://doi.org/10.1049/ipr2.12473 [22] Wenjun Tan, Pan Liu, Xiaoshuo Li, Shaoxun Xu, Yufei Chen, and Jinzhu Yang. Segmentation of lung airways based on deep learning methods. IET image processing, 16(5):1444-1456, 2022. https://doi.org/10.1049/ipr2.12423 [23] Kavita Bhosle, and Vijaya Musande. Evaluation of deep learning CNN model for recognition of devanagari digit. Artificial intelligence and applications, 1(2):114-118, 2023. https://doi.org/10.47852/bonviewAIA3202441 [24] Rui Guo, Yong Zhou, Jiaqi Zhao, Yiyun Man, Minjie Liu, Rui Yao, and Bing Liu. Point cloud classification by dynamic graph CNN with adaptive feature fusion. IET computer vision, 15(3):235-244, 2021. https://doi.org/10.1049/cvi2.12039 [25] Padmaprabha Preethi, and Hosahalli Ramappa Mamatha. Region-based convolutional neural network for segmenting text in epigraphical images. Artificial intelligence and applications, 1(2):119-127, 2023. https://doi.org/10.47852/bonviewAIA2202293 [26] Zhong Qu, Xue Shang, Shu-Fang Xia, Tu-Ming Yi, and Dong-Yang Zhou. A method of single-shot target detection with multi-scale feature fusion and feature enhancement. IET image processing, 16(6):1752-1763, 2022. https://doi.org/10.1049/ipr2.12445 [27] Zunlin Fan, Naiyang Guan, Zhiyuan Wang, Longfei Su, Jiangang Wu, and Qianchong Sun. Unified framework based on multiscale transform and feature learning for infrared and visible image fusion. Optical engineering, 60(12):123102-1-123102-16, 2021. https://doi.org/10.1117/1.OE.60.12.123102 [28] Zhilin He. Improved genetic algorithm in multi-objective cargo logistics loading and distribution. Informatica, 47(2), 2023. https://doi.org/10.31449/inf.v47i2.3958 [29] Ziyu Chen, Huaiyu Zhuang, Jia Han, Yani Cui, and Jiaxian Deng. Multi-scale single image dehazing based on the fusion of global and local features. IET image processing, 16(8):2049-2062, 2022. https://doi.org/10.1049/ipr2.12467 [30] Wen Yang, Ming Zhan, Zhijun Huang, and Wei Sha. Design and development of mobile terminal application based on Android. Informatica, 47(2), 2023. https://doi.org/10.31449/inf.v47i2.4008 [31] Shuhui Zhang, Chenglin Zheng, and Xi Chen. SyPSE: A symbolic computation toolbox for process systems engineering part I- architecture and algorithm development. Industrial & engineering chemistry research, 60(45):16304-16316, 2021. 10.1021/acs.iecr.1c02151 [32] Syed Ahmed Nadeem, Eric A. Hoffman, Jessica C. Sieren, Alejandro P. Comellas, Surya P. Bhatt, Igor Z. Barjaktarevic, Fereidoun Abtin, and Punam K. Saha. A CT-based automated algorithm for airway segmentation using freeze-and-grow propagation and deep learning. IEEE transactions on medical imaging, 40(1):405-418, 2021, https://doi.org/10.1109/TMI.2020.3029013 18 Informatica 48 (2024) 1 –18 H. Li et al.