https://doi.org/10.31449/inf.v47i8.4862 Informatica 47 (2023) 13–18 13 A Study on Error Feature Analysis and Error Correction in English Translation Through Machine Translation Guifang Tao Wuhan Vocational College of Communications and Publishing, Wuhan, Hubei 430223, China E-mail: gui6095@163.com Keywords: machine translation, transformer model, grammatical error detection Received: May 10, 2023 English translation is the most frequently encountered problem in English learning, and fast, efficient and correct English translation has become the demand of many people. This paper studied the most frequently encountered English grammatical error problem in English translation by the Transformer grammatical error correction model in machine translation and explored whether machine translation can analyze the features of the errors that may occur in English translation and correct them. The results of the study showed that the precision of the Transformer model reached 93.64%, the recall rate reached 94.01%, the 𝐹 0.5 value was 2.35, and the value of the Bilingual Evaluation Understudy was 0.94, which were better than those of the other three models. The Transformer model also showed stronger error correction performance than Seq2seq, convolutional neural network, and recurrent neural network models in analyzing error correction instances of English translation. This paper proves that it is feasible and practical to identify and correct English translation errors by machine translation based on the Transformer model. Povzetek: Jezikovni pretvornik je bil uporabljen za odkrivanje napak v prevodu v angleščino. Rezultati so pokazali visoko točnost v primerjavi z drugimi pristopi. 1 Introduction English is a universal language, and countless people are learning English; however, many people are not native English speakers and are prone to English translation errors. To reduce the occurrence of such translation errors, people use computer programs for English translation, i.e., machine translation. With the advent of big data and deep learning technology [1], machine translation technology has been optimized and the quality of translation has been improved, and machine translation has become an important aid to human translation. This paper studied the most frequently encountered English grammatical error problem in English translation by the Transformer-based grammatical error correction model and explored whether machine translation can analyze the features of errors in English translation and correct the errors. The experimental results of various models, such as the Seq2seq model, the convolutional neural network (CNN) model, and the RNN model, were compared by evaluation indicators, thereby ensuring that the Transformer-based grammatical error correction model was of application value. 2 Related works Table 1: Summary of relevant work Model Results Limitations Satir et al. [2] A hybrid system by guiding NMT The proposed method can improve Application s in different languages decoding using the output sentences of the phrase- based SMT systems translation quality. Hnamte et al. [3] Neural machine translation Achieve 42.65 BLEU score on the 4 grams. Dependenc e on corpus during model training Singh et al. [4] A semi- supervised neural machine translation system The proposed semi- supervised system outperforms the supervised, the pretrained mBART and existing semi- supervised baselines in terms of automatic score and subjective evaluation parameters by a significant The proposed semi- supervised approach is robust to handle rare words and long-term dependenci es as evident from the error analysis based on word translation 14 Informatica 47 (2023) 13–18 G. Tao margin up to +4.5 and +1.2 BLEU improvements against the supervised and mBART baselines respectively. accuracy and BLEU scores grouped by the sentence length; an imbalanced combinatio n of the synthetic data is found to deteriorate the overall performanc e. Laskar et al. [5] An Assamese pre-trained language model With the use of both prior alignment and a pre-trained language model, the transformer- based neural machine translation model shows improvement, and we have achieved state- of-the-art results for the English-to- Assamese and Assamese-to- English translation, respectively. The multilingua l transfer learning- based approach for further research Loubser et al. [6] The neural network model for core language technologi es The neural model performs comparably with the baseline on Afrikaans and disjunctive languages (accuracy within 1%), and slightly worse on conjunctive languages, falling short of the baseline by 2.3% on average. Neural networks can The experiment s in this paper evaluated only one neural architecture for each task. be viable implementatio ns of core language technologies for resource- scarce South African languages. Through the analysis of the current related research, we can find that in the research of machine translation, how to further improve the quality of translation and expand the application of machine translation in different languages is still a focus of current research. Due to the low quality of machine translation, many translation errors often occur, and this problem can be effectively solved by improving the quality of translation. Therefore, this paper investigates the problem of error elimination and correction in English translation. 3 English translation error correction method 3.1 Analysis of English translation error characteristics English is currently the most widely spoken language in the world [7], and the most common difficulty in the process of learning English is English translation. Since languages are not the same from country to country, the influence of factors such as Chinglish and insufficient vocabulary leads to frequent translation errors. The analysis of the English translation error characteristics of the data set studied in this paper reveals that most of the errors in the data set are grammatical translation errors, and grammatical errors can be divided into two categories: lexical errors and syntactic errors. Lexical errors mainly include lexical errors and singular and plural mistranslation, while syntactic errors mainly include subject mistranslation, confusion of logical relations, and verb tense errors [8]. For example, in terms of the vocabulary error, “ 红茶” is translated as “red tea”, but the actual correct translation is “black tea”; in terms of the singular and plural mistranslation, “ 她有 5 天的带薪年假” is translated as “She has five day of paid annual leave”, but the actual correct translation is “five days”; in terms of the grammatical error, “ 我们的土地已经得到很好的开 发” is translated as “Our land has well developed”, but the actual correct translation is “has been”; in terms of the verb tense error, “ 他说他第二天要去 爬山” is translated as “He said he will go climbing the next day”, but the actual correct translation is “would”. 3.2 Transformer model In this paper, the Transformer model is used as an error corrector for machine translation. A transformer error correction model is a neural machine translation model [9] with efficient model training efficiency, which can A Study on Error Feature Analysis and Error Correction in English… Informatica 47 (2023) 13–18 15 automatically correct a large number of recognition errors, especially the substitution errors in recognition results. The model is mainly composed of two parts: the encoder and the decoder. The encoder consists of a self-attention layer and a feedforward neural network layer. The self- attention layer contains multi-headed attention, summation, and normalization, and the feedforward neural network layer contains feedforward neural network, summation, and normalization. The decoder consists of the self-attention layer, the encoder-decoder attention layer, and the feedforward neural network layer, which also contains summation and normalization. The equation for the self-attention layer is: Attention ( Q, K , V) = softmax( QK T √d K )V , (1) where Q is the query vector, K is the key vector, V is the value vector, √d is a fixed factor, and d is the dimension of the hidden layer. The Transformer model splits the self-attention mechanism into (Q, K, V) and applies multi-headed attention in the self-attention layer, where Q query vector, K key vector, and V value vector are all from the output of the previous sub-layer. The Transformer model for error correction works by inputting the source error sentence, outputting the corresponding feature vectors after encoder processing, inputting the feature vectors to the decoder for re- processing and outputting the target corrected sentence. In the training process of the Transformer-based error correction model, since the model does not contain convolutional network and recurrent network, the sequential order of words in the data cannot be obtained, and the model needs to be embedded with position encoding features; the position information of words is added to the embedded vector to allow the model to discriminate words in different positions. The model position coding used in this paper is sine/cosine position coding [10], and the specific formulas are given in Equations (2) and (3): 𝑃𝐸 ( 𝑝𝑜𝑠 ,2𝑖 ) = 𝑠𝑖𝑛 ( 𝑝𝑜𝑠 10000 2𝑖 𝑑 𝑚𝑜𝑑𝑒𝑙 ) , (2) 𝑃𝐸 ( 𝑝𝑜𝑠 ,2𝑖 +1) = 𝑐𝑜𝑠 ( 𝑝𝑜𝑠 10000 2𝑖 𝑑 𝑚𝑜𝑑𝑒𝑙 ) , (3) where PE is the abbreviation for position encoding, pos is the specific position of a word in a sentence, i is the component of every value in the position encoding, and 𝑑 𝑚𝑜𝑑𝑒𝑙 is the uniform dimension of the input and output of different layers in the model. When the word is in an even position, the sine coding is used, and when it is in an odd position, the cosine coding is used. 4 Experimental analysis 4.1 Data collection and processing The dataset for this study comes from the British National Corpus (BNC), Lang-8, the Tsinghua University Chinese- English Parallel Corpus (THUMT), and the CoNLL-2014 test set. Five million pieces of data were used as the initial data for this study. Three million pieces of data were used as the training set for continuous training and adjustment of the model, and the remaining two million pieces were used as the test set to test the final machine translation results of the model. The data specifications are listed in Table 1. Table 2: Experimental data sources and quantities Data distribution Corpus name Number of data (piece) Training set data British National Corpus (BNC) 1.5 million Lang-8 1.5 million Test set data Tsinghua University Chinese-English Parallel Corpus (THUMT) 1 million CoNLL-2014 test set [11] 1 million After collecting the data from the experiments, they were processed to meet the input requirements of the model. The data processing methods used in this paper are as follows. Firstly, the data with a large number of duplicates in the dataset were deleted to avoid the deviation of machine translation results caused by duplicate data in the process of model training. Secondly, the length of the data in the dataset was unified to avoid the step of splitting the long data. The maximum length of the data set was set as 100, and the part exceeding this length was directly truncated. Thirdly, special symbols, such as ¥ and m 2 , were processed. The data in the dataset will inevitably have some special symbols. Since the model could not translate the special symbols, the data containing special symbols were deleted. Fourthly, the initial data was transformed into grammatically incorrect data by means of noise perturbation. The experiment aimed to prove that the machine translation based on the Transformer model could correct wrong English translation, so data containing a large number of grammatical errors were needed for the experimental operation. The sentences in the dataset were treated by Byte Pair Encoding (BPE) [12], and then the words were deleted, replaced, or inserted by other words with random probability, thus generating data containing a large amount of grammatical errors for the model analysis. 4.2 Experimental design The Transformer-based error correction model was used as the training model. In this model, both the encoder and decoder had six layers, there were eight multi-headed attention layers, the dimension of the hidden layer of the feedforward network was 2,048, and the dimension of the word embedding was 512. Before experiment, enough data were collected for the model analysis. Then, the data set was divided and preprocessed to meet the training requirements of the Transformer-based-training and fine- 16 Informatica 47 (2023) 13–18 G. Tao tuned to reach the optimum by inputting the training set data. After the optimal model was obtained, the test set data were input to the model for machine translation. Two evaluation indicators, M 2 and Bilingual Evaluation Understudy (BLEU), were used to evaluate the translation results to verify the effectiveness of the model. Moreover, the translation results were compared with the machine translation results of Seq2seq [13], CNN [14], and RNN models [15] to ensure that the Transformer-based error correction model is of practical value in machine translation self-correction. In addition, the initial parameters of different models were set uniformly to ensure the reliability and validity of the experimental results. The Adam optimization function was used during model training, the learning rate was set as 0.001, the dropout was set as 0.5, the training batch size was 100, and the number of iterations was set as 40. 4.3 Evaluation indicators 4.3.1 The maximum matching score (𝐌 𝟐 ) The maximum matching score (𝑀 2 ) [16] is one of the most commonly used methods to assess English grammar error correction models. The first evaluation index used in this paper is the maximum matching score (𝑀 2 ) in which precision (P), recall rate (R), and 𝐹 0.5 value as the main evaluation indexes. Their calculation formulae are: P = ∑ |A i ∩B i | n i=1 ∑ |A i | n i=1 , (4) R = ∑ |A i ∩B i | n i=1 ∑ |B i | n i=1 , (5) where A i is the set of corrective edits output by the model and B i is the set of corrective edits for manual annotation. The reason for choosing 𝐹 0.5 as the evaluation indicator is because the accuracy of machine translation is valued more than the number of translations in the error correction model, so the weight of accuracy is set at twice the recall rate. The value of 𝐹 0.5 is a combination of precision and recall rate. When the 𝐹 0.5 value is higher, the better the translation output of the model is, and vice versa. The calculation formula is: F 0.5 = ( 1+0.5 2 ) ∗precision∗recall rate 0.5 2 ∗precision +recall rate . (6) 4.3.2 BLEU value The second evaluation indicator used was BLEU [17], which is very common in the evaluation metrics of machine translation, and it was used to evaluate the difference values in the model-generated machine translation text and the actual correct text. Its value was between 0 and 1. If two texts matched perfectly, then the value of BLEU was 1; otherwise, the value of BLEU was 0. 4.4 Analysis of results As seen in Figure 1 above, the word error rate decreased as the number of hidden layers increased, but too many hidden layers resulted in a large number of neurons, which increased the computational load of the model. In Figure 1, the word error rate of the four different models became the lowest when the number of hidden layers of the model was 6; however, the word error rate tended to increase after that, even though the number of hidden layers increased. Therefore, the number of hidden layers was set at 6 to avoid the increase in computation and decrease in accuracy of the model due to the excessive number of hidden layers. Figure 1: The number of batches and BLEU variation of different models. Table 3: Comparison of error-corrected translation results of different models with correct translations in some cases. Original sentence Transformer model Seq2seqmodel CNN model RNN model Correct translation 这个村庄 在山腰 The village on side of the mountain The village is on the side of the mountain The village is by the side of the mountain The village is on the hillside The village is halfway up the hill The village is on the side of the mountain Original sentence Transformer model Seq2seqmodel CNN model RNN model Correct translation 报纸上都 是最新 的新闻 The newspapers are full of the newest new The newspapers are full of the latest news The newspapers are full of the newest news The newspapers are full of the latest news The newspapers are full of the newest information The newspapers are full of the latest news Original sentence 只有我们 互相包 容的时 候, 才能 更好的合 作 A Study on Error Feature Analysis and Error Correction in English… Informatica 47 (2023) 13–18 17 Transformer model Seq2seqmodel CNN model RNN model Correct translation Only when we toleranting of each other we can have better cooperate Only when we are tolerant of each other can we cooperate better When we are tolerant of each other then we can cooperate better When we are tolerant of each other, we can cooperate better When we are tolerant of each other, we can cooperate better Only when we are tolerant of each other can we cooperate better Through the above three case studies of error- corrected parallel sentences, it was seen that the correction results of the Transformer model were the same as the correct English translations for reference, and its error correction result was better than the other three correction models. In Case 1, the correct translation focuses on “on the side of the mountain”, and the overall meaning of the translations of Transformer and RNN models was consistent with it, but the Seq2seq and CNN models were not. In Case 2, the correct translation highlights “the latest”, but “new” does not mean the latest. In Case 3, the correct translation highlights “only”. Although the results of the error-corrected translation by the Seq2seq, CNN, and RNN models did not differ much from the overall meaning of the correct sentence, they did not translate the word “only”. Table 4: Comparison of experimental results between different error correction models. Secondary evaluation indexes of the confusion matrix BLEU Precision Recall rate 𝐹 0.5 Transformer model 93.64% 94.01% 2.35 0.94 Seq2seq model 87.26% 85.39% 2.15 0.86 CNN model 83.55% 83.61% 2.09 0.81 RNN model 84.37% 83.94% 1.99 0.82 It was observed in Table 3 that the Transformer model had a precision of 93.64%, a recall rate of 94.01%, and an F 0.5 value of 2.35; they were much higher than the precision, recall rate, and F 0.5 value of the Seq2seq, CNN, and RNN models. In terms of BLEU, the BLEU value of the Transformer model was 0.94, which was very close to 1. This indicated that the Transformer model had a very good performance in error correction for machine translation of English. Although the BLEU values of the Seq2seq, CNN, and RNN models were 0.86, 0.81, and 0.82, respectively, suggesting good translation performance, there was still a distance with 1. Therefore, it was concluded that the Transformer model was better than the Seq2seq, CNN, and RNN models in terms of error correction for machine translation of English. 5 Discussion The translation error in the process of machine translation directly affects the quality of translation and the reliability of the machine translation model, so how to further improve the quality of machine translation has become an urgent problem to solve. As English is a widely used language, the study of its translation errors is of great practical value, so this paper studied the characteristics of errors in English translation and the error correction. The comparison between the Transformer model and other models showed that the Transformer model had a lower error rate in the English translation task, and the translation results obtained were closer to the correct translation, i.e. the translation quality was higher. From the analysis in Table 2, it was found that the Transformer model handled translation details better and had fewer translation errors compared to models such as CNN and RNN. Then, from the comparison in Table 3, it can be seen that the Transformer model outperformed the Seq2seq model, CNN model, and RNN model in terms of precision, recall rate, and F 0.5 value, with precision and recall rate above 90%. The BLEU of the Transformer model was 0.94, which was 9.3%, 16.04%, and 14.63% higher than Seq2seq, CNN, and RNN models, respectively, further proving the reliability of Transformer model in English translation error correction. The results of the study prove the effectiveness of the Transformer model in error feature analysis and error correction of English translation, which further improves the quality of English translation based on the current study, but there are also some limitations. The research in this paper is based on English translation, and the applicability of the Transformer model for error correction in other languages is unclear. Although some results have been achieved in improving BLEU scores compared to other current methods, further validation on larger datasets is still needed, and these are issues that need to be addressed in future work. 6 Conclusion This paper briefly introduced the characteristics of English translation errors and the Transformer model, and analyzed whether the Transformer model can analyze error features and correct errors by machine translation. A sufficient amount of data was collected and processed before the experiment, the training set data was input into the Transformer model for training, the test set data was input after the model was continuously adjusted to reach the optimal model, and the final machine translation results were obtained. Finally, the experimental results were evaluated using two major evaluation indexes, namely, M 2 and BLEU. The results showed that the machine translation precision of the Transformer model reached 93.64%, the recall rate reached 94.01%, the F 0.5 value reached 2.35, and the BLEU value reached 0.94. The analysis of the correction cases of some English 18 Informatica 47 (2023) 13–18 G. Tao mistranslations showed that the Transformer model was more effective than the other three models in the error correction of English translation. It proves that machine translation based on the Transformer model can be used to identify and correct English translation errors in the future. References [1] Zhang B, Xiong D, Su J (2020) Neural Machine Translation with Deep Attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, pp. 154-163. https://doi.org/10.1109/TPAMI.2018.2876404 [2] Satir E, Bulut H (2021) A Novel Hybrid Approach to Improve Neural Machine Translation Decoding using Phrase-Based Statistical Machine Translation. 2021 International Conference on Innovations in Intelligent Systems and Applications (INISTA). https://doi.org/10.1109/INISTA52262.2021.954840 1 [3] Hnamte V, Thangkhanhau H, Hussain J, Lalnunmawii C, Tlaisun L, Vanlalruata (2022) Mizo to English Machine Translation: An Evaluation Benchmark. 2022 International Conference on Futuristic Technologies (INCOFT), pp. 1-6, https://doi.org/10.1109/INCOFT55651.2022.10094 376 [4] Singh SM, Singh TD. (2022) Low resource machine translation of English-Manipuri: A semi-supervised approach. Expert Systems with Applications, 209, pp. 118187, https://doi.org/10.1016/j.eswa.2022.118187 [5] Laskar SR, Paul B, Dadure P, Manna R, Pakray P, Bandyopadhyay S (2023) English-Assamese neural machine translation using prior alignment and pre- trained language model. Computer Speech & Language, 82, pp. 101524. https://doi.org/10.1016/j.csl.2023.101524 [6] Loubser M, Puttkammer MJ (2020) Viability of Neural Networks for Core Technologies for Resource-Scarce Languages. Information (Switzerland), 11, pp. 41. https://doi.org/10.3390/info11010041 [7] Xu C, Li Q (2021) Machine Translation and Computer Aided English Translation. Journal of Physics: Conference Series, 1881, pp. 1-8. https://doi.org/10.1088/1742-6596/1881/4/042023 [8] Huyen N T (2020) Common grammatical errors in English writing - A case study with second-year students of information technology at HAUI. Can Tho University Journal of Science, 11, pp. 37. https://doi.org/10.22144/ctu.jen.2020.005 [9] Nguyen T, Nguyen L, Tran P, Nguyen H (2021) Improving Transformer-Based Neural Machine Translation with Prior Alignments. Complexity, 2021, pp. 1-10. https://doi.org/10.1155/2021/5515407 [10] Ling G, Yang X (2022) Knowledge Base Question Answering Based on Multi-head Attention Mechanism and Relative Position Coding. Journal of Physics: Conference Series, 2203, pp. 1-7. [11] Guo D M (2020) Jointly Part-of-Speech Tagging and Semantic Role Labeling Using Auxiliary Deep Neural Network Model. Computers, Materials & Continua, 2020, pp. 529-541. https://doi.org/10.32604/cmc.2020.011139 [12] Amalia A, Sitompul O S, Mantoro T, Nababan E B (2021) Morpheme Embedding for Bahasa Indonesia Using Modified Byte Pair Encoding. IEEE Access, 9, pp. 155699-155710. https://doi.org/10.1109/ACCESS.2021.3128439 [13] Li Y, Li J, Zhang M (2021) Deep Transformer modeling via grouping skip connection for neural machine translation. Knowledge-based Systems, 234, pp. 1-12. https://doi.org/10.1016/j.knosys.2021.107556 [14] Ren Q, Su Y, Wu N (2020) Research on Mongolian- Chinese machine translation based on the end-to-end neural network. International Journal of Wavelets Multiresolution & Information Processing, 18, pp. 46-59. https://doi.org/10.1142/S0219691319410030 [15] Datta D, Evangeline P, Mittal D, Jain A (2020) Neural Machine Translation using Recurrent Neural Network. International Journal of Engineering and Advanced Technology, 9, pp. 1395-1400. https://doi.org/10.35940/ijeat.D7637.049420 [16] Qi L, Guo FY, Zhang J, Wang Y W (2022) An internet review topic hierarchy mining method based on modified continuous renormalization procedure. Fractals, 30, pp. 1-25. https://doi.org/10.1142/S0218348X22501341 [17] Phan-Vu H H, Tran V T, Nguyen V N, Dang H V, Do P T (2019) Neural Machine Translation between Vietnamese and English: an Empirical Study. Journal of Computer Science and Cybernetics, 35, pp. 147-166. https://doi.org/10.15625/1813- 9663/35/2/13233