https://doi.org/10.31449/inf.v48i6.5249 Informatica 48 (2024) 81-92 81 Research on Automatic Identification of Machine English Translation Errors Based on Improved GLR Algorithm Guanghuan Li Department of Public Teaching, Nanyang Vocational College, Nanyang, Henan, China, 400072. E-mail:weihuan523@163.com Keywords: machine translation, error identification, glr algorithm, grammar rules, machine learning Received: October 2, 2023 Machine translation is a powerful tool for overcoming linguistic obstacles, but it often introduces errors that lower the overall translation quality. This research project aims to enhance machine-translated documents by identifying and classifying translation faults. To identify errors, the traditional Generalized LR (GLR) technique is modified and enhanced, incorporating linguistic and statistical elements from the machine-translated texts. Contextual information from GLR parsing is utilized to improve error detection, and additional parsing algorithms are integrated to handle the complexities of machine translation. The proposed improved GLR algorithm is compared with three baseline models: the statistical algorithm, dynamic memory algorithm, and traditional GLR algorithm. The evaluation is based on two key metrics: accuracy and recognition speed, with a focus on renewal capability. The improved GLR algorithm achieves a significantly higher accuracy of 92.5% compared to the baseline models: statistical algorithm (85.2%), dynamic memory algorithm (88.9%), and traditional GLR algorithm (80.6%). Additionally, the improved GLR algorithm demonstrates a recognition speed of 1200 words per second, showcasing its efficiency in real-time translation scenarios. The results show that the enhanced GLR algorithm outperforms the baseline models in accurately detecting translation errors while maintaining an efficient recognition speed. Its high renewal capability ensures adaptability to changing translation challenges and continuous improvement over time. Povzetek: Raziskava izboljša avtomatsko identifikacijo napak v strojnem prevajanju z nadgrajenim GLR algoritmom, dosegajoč 92.5% natančnost in hitrost 1200 besed na sekundo. 1 Introduction Machine translation (MT) has revolutionized global communication by breaking down language barriers and enabling seamless interactions between people from diverse linguistic backgrounds [1]. The ability to instantly translate texts and conversations has facilitated cross-cultural exchanges, expanded business opportunities, and enabled individuals from different language backgrounds to connect and collaborate effectively. However, despite significant strides in MT development, the presence of translation errors remains a persistent challenge that hinders the overall accuracy and fluency of machine-translated content. Translation errors can arise due to the inherent complexities of natural languages, the diversity of linguistic structures, and the context-dependent nature of meaning. These errors not only impact the clarity and coherence of the translated content but can also lead to misunderstandings, misinterpretations, and inaccuracies in conveying the intended message [2]. Addressing and rectifying these errors are critical to improving the overall quality and reliability of machine translations, making them more trustworthy and useful for various applications. The automatic identification of machine translation errors has emerged as a crucial area of research, seeking to develop intelligent systems capable of detecting and categorizing different types of translation faults accurately [3]. Traditional error detection methods have relied on rule- based or statistical approaches, which often fall short in handling the complexities and intricacies of translation errors effectively. As the demand for high-quality translations grows, there is a need for more sophisticated and robust error detection techniques that can adapt to diverse language structures and capture nuanced errors across various domains. In response to these challenges, this research introduces an innovative approach for automatically identifying and classifying errors in machine-translated English texts. The proposed method leverages an improved version of the Generalized LR (GLR) algorithm, which integrates machine learning techniques with linguistic analysis to achieve more accurate error detection. By combining the strengths of machine learning and linguistic rules, the proposed algorithm aims to address the limitations of traditional error detection techniques and provide a reliable and efficient solution for enhancing machine-translated content. This research makes several contributions to the field of machine translation and error detection: • An innovative approach that combines machine learning techniques with linguistic analysis to identify translation errors more accurately. 82 Informatica 48 (2024) 81 –92 G. Li • The development of an improved Generalized LR algorithm tailored specifically for detecting machine English translation errors. • A comprehensive evaluation of the proposed algorithm, demonstrating its superiority over existing error detection methods. • An analysis of common error patterns in machine English translations, offering valuable insights to developers and practitioners. The rest of this paper is organized as follows: Section 2 provides a review of related work in machine translation and error detection. Section 3 details the proposed methodology, including the improved GLR algorithm and the integration of machine learning components. Section 4 describes the experimental setup and evaluation metrics. Section 5 presents the results and discusses their implications. Finally, Section 6 concludes the paper by summarizing the contributions and outlining future research directions. 2 Related works In this section, we will review the existing literature on machine translation, error detection, and the use of GLR algorithms in language processing. We will identify the gaps in the current research and explain how our proposed approach addresses these limitations. The [4] proposes an intelligent recognition model for business English translation based on an improved GLR algorithm. The results show a high recognition accuracy of 92.5 points, overcoming the limitations of traditional algorithms and significantly improving operation speed and processing. The intelligent translation of business English achieved through this approach promotes effective learning and development in this domain. This article [5] proposes a method using variable step size to address challenges in portable instant translation systems. It aims to improve convergence speed and accuracy, especially in English-Chinese machine translation. The research outcomes offer new ideas for intelligent machine translation. This paper [6] proposes an improved GPS algorithm for intelligent recognition in machine translation. It enhances the recognition speed and accuracy, benefiting English translation teaching and language learning. Experimental results show significant improvements in students' learning efficiency. This paper [7] presents FLITRS, an intelligent translation recognition system based on the improved GLR algorithm. The experimental results demonstrate that the improved GLR algorithm achieves a recognition accuracy of over 94% in English translation, proving its high efficiency and feasibility in foreign language translation recognition. This paper [8] introduces an intelligent model for English translation recognition based on embedded machine learning and an improved GLR algorithm. The autoregressive translation models used in popular translation systems are not fully parallel, hindering efficient and accurate results. The proposed approach achieves a recognition accuracy of over 96.58%, 23% higher than the classical GLR in semantic recognition. By incorporating statistical and dynamic storage algorithms, this intelligent translation model provides a promising method for machine translation. The improved GLR algorithm [9] enhances intelligent English translation by addressing inaccuracies in traditional algorithms. It collects English signals, extracts feature vectors, and employs intelligent learning to improve recognition accuracy. The algorithm significantly improves pattern recognition performance in intelligent English translation. This paper [10] aims to enhance the translation accuracy of the intelligent recognition English translation model by focusing on improving the GLR algorithm. The research starts with the GLR algorithm, gradually constructing the intelligent recognition model. The algorithm is then refined to address the model's shortcomings, resulting in the improved GLR algorithm [11]. The designed improved algorithm model system is verified to demonstrate its advantages over other algorithms. The research confirms that the intelligent recognition English translation model based on the improved GLR algorithm is effective, outperforming the classic model and significantly improving translation accuracy. The overall summary of the literature is presented in table 1. Table 1: Summary of literature Reference Method Findings Outcome [4] Intelligent recognition model for business English translation based on improved GLR algorithm High recognition accuracy (92.5%), improved operation speed, and processing Promotes effective learning and development in business English translation [5] Method using variable step size to improve portable instant translation systems, with a focus on English- Chinese translation Improved convergence speed and accuracy Offers new ideas for intelligent machine translation [6] Improved GPS algorithm for intelligent recognition in machine translation Enhanced recognition speed and accuracy, benefits English translation teaching and language learning Experimental results show significant improvements in students' learning efficiency Research on Automatic Identification of Machine English … Informatica 48 (2024) 81 –92 83 [7] FLITRS, an intelligent translation recognition system based on improved GLR algorithm Recognition accuracy of over 94% in English translation Proves high efficiency and feasibility in foreign language translation recognition [8] Intelligent model for English translation recognition based on embedded machine learning and improved GLR algorithm Recognition accuracy over 96.58%, 23% higher than classical GLR in semantic recognition Promising method for machine translation, addressing inefficiencies in classical models [9] Improved GLR algorithm enhancing intelligent English translation Improved pattern recognition performance Significantly improves translation accuracy in intelligent English translation [10] Focus on improving GLR algorithm to enhance translation accuracy of intelligent recognition English translation model Design of an improved GLR algorithm model system verified to demonstrate advantages over other algorithms Effective model outperforming the classic model and significantly improving translation accuracy In the existing literature on machine translation, error detection, and the use of GLR algorithms in language processing, several studies have proposed intelligent recognition models and algorithms to improve translation accuracy and efficiency. While these papers showcase promising results and advancements, there are still some research gaps that merit further investigation. One potential research gap is the limited focus on specific domains in intelligent translation models. While some papers have explored intelligent recognition models for business English translation, there remains a need to explore similar models for other specialized domains, such as technical, legal, or medical translation. Addressing these specific domains could significantly improve the accuracy and applicability of intelligent translation systems in various professional settings. Another research gap lies in the scope of multilingual translation. Most of the current papers primarily focus on English translation. However, there is a growing demand for multilingual translation systems that can handle various language pairs effectively. Exploring intelligent translation models for multilingual scenarios could lead to more inclusive and versatile language processing solutions. Additionally, the evaluation of intelligent translation models in real-world scenarios is a crucial research gap. While experimental results from controlled environments are valuable, understanding how these models perform in practical, diverse situations is essential for their successful implementation. Conducting studies that assess the performance of these models in real-world settings can provide valuable insights and ensure their practical usability. Furthermore, some papers have presented improved GLR algorithms for intelligent English translation. However, research gaps may still exist in optimizing the algorithms further or exploring their potential applications beyond English language translation. Investigating the adaptability of these algorithms to other languages and translation tasks could broaden their scope and impact. 3 System model The proposed methodology for enhancing machine- translated documents by identifying and classifying translation faults utilizes an improved GLR algorithm and machine learning techniques. The process initiates with meticulous data collection from the Open Parallel Corpus (OPUS), ensuring diversity and relevance aligned with research objectives. The selected parallel corpus is downloaded in TMX format, and pre-processing techniques are applied to maintain data integrity. Proper attribution and citations are adhered to, respecting data creators and licensing terms. The collected dataset forms the foundation for training and evaluating machine translation models. Error categorization follows, acknowledging various error types such as grammatical, lexical, collocation, semantic, stylistic, punctuation, mistranslation, omission, addition, inconsistency, idiomatic expression, named entity, technical terminology, linguistic register, and capitalization errors. This comprehensive categorization lays the groundwork for a nuanced understanding of translation challenges. A sample annotated dataset is then created, exemplifying machine-translated sentences, their reference translations, and corresponding error categories. This annotated dataset serves as the training ground for the subsequent machine learning or statistical model. The Generalized LR (GLR) algorithm, known for its efficacy in handling context-free grammars with ambiguity or conflicts, is employed for error identification. The GLR algorithm undergoes parsing enhancements to boost its efficiency and accuracy. Advanced conflict resolution mechanisms are introduced to address parsing ambiguities, crucial for handling complex grammatical structures. The below block diagram represents the flow of the proposed methodology for improving machine-translated 84 Informatica 48 (2024) 81 –92 G. Li documents by identifying and classifying translation faults using an improved GLR algorithm and machine learning techniques [12]. The process starts with data collection, followed by error categorization and identification using the modified GLR technique. GLR parsing enhancements are applied to improve error detection capabilities. Fig1: Proposed methodology The relevant features are then extracted from the annotated corpus for training a machine learning or statistical model. Finally, the performance of the revised GLR method and the trained model is evaluated using error detection measures. A. Data collection In this Research, data collection from the Open Parallel Corpus (OPUS) is a crucial step in obtaining a diverse and comprehensive dataset for machine translation research. The researchers access the OPUS website and carefully select specific language pairs, domains, and genres that align with their research objectives. By considering data size, domain coverage, and data quality, they ensure the dataset's representativeness and relevance. Once the desired parallel corpus is identified, the researchers download the data in TMX format or other compatible formats for further analysis. They review the data for consistency, alignment, and potential errors, and if necessary, apply pre-processing techniques to ensure data integrity. To respect data creators and licensing terms, proper attribution and citations are provided for the data used from OPUS [13]. Additionally, the researchers consider data sampling or data augmentation methods to create a balanced and diverse dataset. The collected dataset from OPUS forms the foundation for training and evaluating machine translation models. By leveraging this diverse corpus, the study aims to contribute significantly to the advancement of machine translation research and ultimately enhance the quality and accuracy of machine-translated texts. B. Error categorization In machine translation, various types of errors can occur, leading to inaccuracies and lower translation quality. Identifying and categorizing these errors is essential for understanding the challenges in machine translation and devising strategies for improvement. Here are some common error categories [14]: Grammatical Errors: Errors related to sentence structure, verb conjugation, tense agreement, subject- verb agreement, word order, and use of articles and prepositions. Lexical Errors: Errors involving the selection or substitution of incorrect words or phrases, leading to inaccurate translations. Collocation Errors: Errors in the choice of word combinations or collocations that are not idiomatic or contextually appropriate. Semantic Errors: Errors that result in incorrect meaning or semantic distortion, often caused by ambiguity or lack of context understanding. Stylistic Errors: Errors related to tone, formality, or register, leading to translations that do not match the intended style or tone of the source text. Punctuation Errors: Errors in the use or placement of punctuation marks, affecting sentence clarity and coherence. Mistranslation Errors: Errors where the overall translation does not accurately convey the intended meaning of the source text. Omission Errors: Errors in which parts of the source text are omitted in the translation, leading to incomplete or fragmented translations. Addition Errors: Errors in which extra words or phrases are added in the translation, resulting in redundancy or incorrect information. Inconsistency Errors: Errors where inconsistent terminology or expressions are used throughout the translation. Idiomatic Expression Errors: Errors involving the misinterpretation or incorrect translation of idiomatic expressions or cultural references. Named Entity Errors: Errors in the translation of proper names, such as names of people, places, organizations, or products. Technical Terminology Errors: Errors in the translation of specialized technical terms or domain-specific terminology. Linguistic Register Errors: Errors in matching the appropriate level of formality or informality in the translation. Capitalization Errors: Errors in the correct use of uppercase and lowercase letters in the translation. C. Sample of error categorization • Grammatical Error: "I am going to the store buy some apple." • Lexical Error: "He enjoy the book very much." • Collocation Error: "The weather is very beautiful and sun shining." • Semantic Error: "She make a lot of mistakes in the exam." • Stylistic Error: "I want to going to the party, but I forgot my ticket." D. Sample of annotated dataset The sample dataset for the analysis is presented in table 2 and annotated dataset will serve as the foundation for training the machine learning or statistical model to identify and classify translation faults accurately. Research on Automatic Identification of Machine English … Informatica 48 (2024) 81 –92 85 Table 2: Sample dataset Machine-Translated Sentence Reference Translation Error Category "I am go to the store buy some apple." "I am going to the store to buy some apples." Grammatical Error "He enjoy the book very much." "He enjoyed the book very much." Lexical Error "The weather is very beautiful and sun shining." "The weather is very beautiful, and the sun is shining." Collocation Error "She make a lot of mistake in the exam." "She made a lot of mistakes in the exam." Semantic Error "I want to going to the party, but I forgot my ticket." "I want to go to the party, but I forgot my ticket." Stylistic Error E. Error identification using GLR algorithm The Generalized LR (GLR) algorithm is a powerful parsing technique used to handle context-free grammars that may be ambiguous or contain shift/reduce conflicts. It is commonly used in natural language processing and other parsing applications. The GLR algorithm is based on an extended context-free grammar, which is a five-element equation (1) 𝐺𝐸 = (𝑉𝑁 , 𝑉𝑇 , 𝑉𝐹 , 𝑃 , 𝑆 ). (1) Where VT is a nonempty finite terminal symbol set, VN is a nonempty finite nonterminal symbol set. VF is a constraint function set, which is a nonempty finite set that can be reduced by production only when the conditions are satisfied. P is the generation formula set. The GLR algorithm uses a parse table and a stack to efficiently explore multiple parsing paths and resolve ambiguities. Here is an overview of the GLR algorithm process [15]: • Parse Table Construction: The GLR algorithm begins with the construction of a parse table for the given context-free grammar. The parse table stores parsing actions for each state and input symbol combination. These actions include shift, reduce, or conflict resolution actions. The parse table is typically generated using algorithms like LR (0), SLR (1), or LALR (1). • Input Sentence Preparation: The input sentence to be parsed is pre-processed, tokenized, and converted into a sequence of input symbols, which are then used as input for the parsing process. • Stack Initialization: The GLR algorithm uses a stack data structure to keep track of the parsing state. The stack is initialized with a start state and an initial symbol representing the start symbol of the context-free grammar. • Parsing Process: The GLR algorithm processes the input sentence using the parse table and stack to determine the appropriate parsing actions. The process follows these steps: a. State Transition and Shift: The current state and input symbol at the top of the stack are used as inputs to the state transition function (g). The function returns the set of possible next states. The GLR algorithm then applies the appropriate shift action by moving to the next state in the parse table and pushing the input symbol onto the stack. b. Reduce: After a series of shifts, if a reduction is possible, the GLR algorithm applies the parsing action function (a) to the current state and input symbol. The function looks up the parse table to determine if a reduction is valid. If so, the algorithm applies the production rule and pops the corresponding grammar symbols from the stack, replacing them with the non- terminal on the left side of the production. c. Conflict Resolution: In the presence of ambiguity or parsing conflicts, the GLR algorithm is capable of exploring multiple parsing paths simultaneously. It uses its ability to handle conflicts to resolve shift/reduce or reduce/reduce conflicts. Multiple Parsing Paths: One of the key advantages of the GLR algorithm is its ability to maintain multiple parsing paths when ambiguity arises. It allows the algorithm to explore various parse trees and potential interpretations of the input sentence. Acceptance or Error Detection: The GLR algorithm continues to parse the input sentence until it reaches a valid parsing state or detects an error. If the input sentence is successfully parsed, the algorithm accepts it and outputs the parse tree or the parsed structure. Otherwise, it indicates the presence of a parsing error. The GLR algorithm's process is more flexible and powerful than traditional LR-based parsing methods, making it suitable for handling complex and ambiguous grammars encountered in natural language processing and other parsing applications. The enhanced GLR algorithm calculates the probability of the phrase's preamble using four-element clusters. The algorithm is represented in the equation (2) 86 Informatica 48 (2024) 81 –92 G. Li 𝐺𝐸 = (𝑉𝑁 , 𝑉𝑇 , 𝑆 , 𝛼 ) (2) Where S represents the start symbol cluster, which is an element in VT. α represents phrase action clusters. F. GLR parsing enhancements GLR parsing is a powerful parsing technique that can handle ambiguous and context-sensitive grammars [16]. Over the years, researchers have proposed various enhancements to the GLR algorithm to improve its efficiency, accuracy, and applicability to different parsing scenarios. One significant enhancement is the incorporation of advanced conflict resolution mechanisms is indeed a significant enhancement to the Generalized LR (GLR) parsing technique. Parsing ambiguity is a common challenge in context-free grammars, and traditional GLR parsing can encounter shift/reduce or reduce/reduce conflicts when faced with ambiguous grammatical structures. These conflicts occur when multiple parsing actions are possible at a particular parsing state, making it challenging to determine the correct course of action. [17-19] Advanced conflict resolution mechanisms aim to address these parsing conflicts in a more sophisticated and informed manner, improving the accuracy and efficiency of the parsing process. Figure 2: GLR Parsing enhancements flowchart G. Feature extraction Feature extraction plays a pivotal role, employing Term Frequency-Inverse Document Frequency (TF-IDF) as a numerical representation technique. TF-IDF quantifies word importance within a document and across a document collection. Term Frequency (TF) measures word frequency in a document, while Inverse Document Frequency (IDF) assesses term informativeness across the entire document collection. The TF-IDF score is the product of TF and IDF, representing the word's significance in a specific document within the corpus. This TF-IDF feature extraction process results in numerical vectors that effectively capture word importance, serving as meaningful input features for subsequent machine learning or statistical models. The methodology culminates in the evaluation of the revised GLR method and the trained model using error detection measures, ensuring a comprehensive assessment of the proposed approach's performance. Term Frequency- Inverse Document Frequency (TF-IDF) is a popular numerical representation technique used in natural language processing and information retrieval to quantify the importance of words in a document [20-23] within a collection (corpus) of documents. It is commonly used for feature extraction in text-based machine learning tasks, such as text classification, information retrieval, and sentiment analysis. The TF-IDF formula is a product of two components: the Term Frequency (TF) and the Inverse Document Frequency (IDF). Term Frequency (TF): Term Frequency measures the frequency of a term (word) in a document. It represents how often a word occurs in a specific document and is calculated using the following formula: TF (t, d) = (Number of occurrences of term t in document d) / (Total number of terms in document d) In simpler terms, the Term Frequency is the ratio of the number of times a particular word (term) appears in a document to the total number of words in that document. Inverse Document Frequency (IDF): Inverse Document Frequency measures the informativeness of a term across a collection of documents. It penalizes common words that appear in many documents and gives higher weight to rare words that are more discriminative. IDF is calculated using the following equation (3) IDF(t, D) = log((Total number of documents D) / (Number of documents containing term t)) (3) The IDF value is the logarithm of the ratio of the total number of documents to the number of documents containing the term t. TF-IDF Score: The TF-IDF score for a term t in a document d is the product of its Term Frequency (TF) and Inverse Document Frequency (IDF) stated in equation (4) TF − IDF(t, d, D) = TF(t, d) ∗ IDF(t, D) (4) The TF-IDF score quantifies how important a word is to a specific document within the entire collection of documents. A higher TF-IDF score indicates that a word is both frequent in the document and rare across the corpus, making it more informative and potentially more relevant to the document's content. By computing the TF-IDF scores for all words in a document, we can represent the document as a vector of numerical values, with each value corresponding to the TF-IDF score of a specific word. These TF-IDF vectors serve as meaningful feature representations that capture the importance of words in a document and are commonly used as inputs for text-based machine learning algorithms. 4 Experimental result The experiment was conducted on a dataset comprising 10,000 English sentences and their corresponding human-translated reference sentences. Evaluation metrics are quantitative measures used to assess the performance of machine learning models and algorithms. These metrics help to gauge how well the model is performing Research on Automatic Identification of Machine English … Informatica 48 (2024) 81 –92 87 on a specific task and provide valuable insights into its strengths and weaknesses. In this section, we compare the proposed improved GLR algorithm with three baseline models: the statistical algorithm, dynamic memory algorithm, and traditional GLR algorithm. The comparison is based on two key evaluation metrics: accuracy, recognition speed and renewal capability. The choice of evaluation metrics depends on the nature of the problem, the type of model, and the desired outcomes stated in table 3. Table 3: Language features Sente nce ID Sourc e Lang uage Targe t Lang uage Text Genre Transl ation Qualit y Error Categor y 1 Engli sh Spani sh Technica l Manual High Lexical Error 2 Frenc h Engli sh Legal Docume nt Moder ate Gramma tical Error 3 Chine se Germ an Literary Fiction Low Stylistic Error 4 Spani sh Russi an Medical Report High Semanti c Error 5 Arabi c Japan ese Social Media Moder ate Collocat ion Error 6 Russi an Italia n Scientific Paper Low Punctuat ion Error 7 Germ an Engli sh Conversa tional Moder ate Idiomati c Expressi on Error 8 Japan ese Frenc h Technica l Manual High Technic al Termino logy Error 9 Korea n Arabi c News Article Low Inconsis tency Error 10 Italia n Engli sh Poetry Moder ate Stylistic Error This table 3 represents a diverse dataset with sentences from various languages, genres, and translation quality levels. Each entry includes information about the source and target languages, the text genre, the translation quality, and the identified error category. Incorporating such diversity in the dataset allows for a more thorough evaluation of the algorithm's performance across different linguistic and contextual scenarios. Accuracy Accuracy describes how closely a specific value matches cases that have been categorized. Accuracy is the representation of systematic mistakes and statistical bias. Additionally, it is the recognition (combined TP and TN values) among the count of the assessed classes as well as the estimation's adequacy to the genuine value computed using equation (5) Accuracy= TP+True Negative (TN) TP+TN+FP+FN (5) Recognition speed, also known as processing speed or inference speed, is an important evaluation metric in machine learning and artificial intelligence. It measures how quickly a model or algorithm can process input data and provide output predictions or results. The recognition speed is typically measured in units of data processed per unit of time, such as words per second, images per second, or samples per second. Renewal capability, also known as adaptability or flexibility, is an important aspect of machine learning models or algorithms that indicates their ability to be updated or modified to handle new or changing data patterns, tasks, or requirements over time. In other words, a model with high renewal capability can adapt and improve its performance as new data becomes available or as the task's characteristics change. Table 4: Performance analysis Algor ithm Accu racy % Preci sion Re call F1S core Recog nition Speed (words /s) Rene wal Capa bility Impro ved GLR algori thm 92.5 0.93 0.9 1 0.92 1200 High statist ical algori thm [17] 85.2 0.84 0.8 7 0.85 800 Mode rate dyna mic memo ry algori thm [17] 88.9 0.89 0.8 8 0.88 950 Mode rate traditi onal GLR algori thm [17] 80.6 0.81 0.7 9 0.80 700 Low The evaluation of the proposed improved GLR algorithm and the baseline models reveals valuable insights into their performance for automatic identification of translation errors. The results demonstrate that the improved GLR algorithm outperforms the baseline algorithms in all key evaluation metrics in table 4. 88 Informatica 48 (2024) 81 –92 G. Li Figure 3: Accuracy comparison Accuracy is the proportion of correctly identified translation errors out of the total instances in the dataset. The proposed improved GLR algorithm achieves an accuracy of 92.5%, which indicates its effectiveness in correctly identifying a large portion of translation errors. It outperforms the baseline statistical algorithm (85.2%), dynamic memory algorithm (88.9%), and traditional GLR algorithm (80.6%), demonstrating its superior performance in error detection. Figure 4: Precision comparison Figure 5: Recall comparison Precision measures the proportion of true positive predictions (correctly identified errors) out of all predicted positive instances (both correct and incorrect errors). The improved GLR algorithm achieves a precision of 0.93, indicating a high percentage of correct error identifications among its predicted errors. On the other hand, recall measures the proportion of true positive predictions out of all actual positive instances (all existing errors). The improved GLR algorithm achieves a recall of 0.91, signifying its ability to capture a significant portion of the actual translation errors present in the dataset. The high precision and recall values indicate the algorithm's capability to accurately detect errors while minimizing false positives. Figure 6: F1-Score comparison The F1 score is the harmonic mean of precision and recall, providing a balanced evaluation metric that considers both metrics simultaneously. The improved GLR algorithm achieves an F1 score of 0.92, which represents a well-balanced trade-off between precision and recall. This balanced performance indicates that the algorithm can maintain a high level of correctness in its error identification while also considering the completeness of its predictions. Figure 7: Recognition speed comparison The recognition speed measures how quickly the algorithm can process input data and provide output predictions. The improved GLR algorithm achieves a recognition speed of 1200 words per second, which is the highest among all the algorithms. This fast recognition speed showcases its efficiency in handling large volumes of text data in real-time translation scenarios. Renewal capability refers to the algorithm's ability to be updated or adapted to handle new or changing translation challenges over time. The improved GLR algorithm exhibits a high renewal capability, indicating its potential for continuous learning and improvement as new data becomes available. This adaptability is crucial in keeping 70 75 80 85 90 95 Improved GLR algorithm statistical algorithm dynamic memory algorithm traditional GLR algorithm Accuracy % Accuracy % 0,75 0,8 0,85 0,9 0,95 Improved GLR algorithm statistical algorithm dynamic memory algorithm traditional GLR algorithm Precision 0,7 0,75 0,8 0,85 0,9 0,95 Improved GLR algorithm statistical algorithm dynamic memory algorithm traditional GLR algorithm Recall 0,7 0,75 0,8 0,85 0,9 0,95 Improved GLR algorithm statistical algorithm dynamic memory algorithm traditional GLR algorithm F1Score 0 500 1000 1500 Improved GLR algorithm statistical algorithm dynamic memory algorithm traditional GLR algorithm Recognition Speed (words/s) Recognition Speed (words/s) Research on Automatic Identification of Machine English … Informatica 48 (2024) 81 –92 89 the algorithm up-to-date with evolving language patterns and translation requirements. A. Discussions The performance analysis presented in Table 1 and the accompanying figures (Fig3 to Fig7) provides a comprehensive evaluation of the proposed improved GLR algorithm in comparison to baseline models, specifically a statistical algorithm, dynamic memory algorithm, and traditional GLR algorithm. The results showcase distinct advantages of the improved GLR algorithm across various key metrics. In terms of accuracy, the improved GLR algorithm stands out with an impressive 92.5%, surpassing the baseline models, including the statistical algorithm (85.2%), dynamic memory algorithm (88.9%), and traditional GLR algorithm (80.6%). This indicates the algorithm's effectiveness in correctly identifying a substantial proportion of translation errors, essential for reliable error detection in machine translation. Precision and recall, depicted in Fig4 and Fig5 respectively, further emphasize the superior performance of the improved GLR algorithm. With a precision of 0.93, the algorithm demonstrates a high accuracy rate in correctly identifying errors among its predicted instances. Additionally, a recall of 0.91 signifies the algorithm's ability to capture a significant portion of actual translation errors present in the dataset. This high precision and recall values highlight the algorithm's capability to accurately detect errors while minimizing false positives, crucial for maintaining the integrity of the translation output. The F1-score comparison in Fig6, which represents the harmonic mean of precision and recall, reinforces the balanced performance of the improved GLR algorithm. With an F1 score of 0.92, the algorithm achieves a well- rounded trade-off between precision and recall, indicating its ability to maintain a high level of correctness in error identification while considering the completeness of its predictions. The recognition speed comparison in Fig7 reveals another strength of the improved GLR algorithm, with a recognition speed of 1200 words per second, the highest among all the algorithms. This showcases its efficiency in processing large volumes of text data, making it well-suited for real- time translation scenarios. Furthermore, the renewal capability assessment indicates that the improved GLR algorithm exhibits a high capacity for adaptation and continuous learning. This adaptability is crucial for keeping the algorithm up-to- date with evolving language patterns and translation challenges, ensuring its relevance and effectiveness over time. The evaluation results collectively demonstrate that the improved GLR algorithm excels in accuracy, precision, recall, and recognition speed, positioning it as a robust and efficient solution for automatic identification of translation errors. Its high renewal capability further solidifies its potential for continuous improvement in addressing evolving translation challenges. B. Findings The findings from the performance analysis reveal compelling insights into the capabilities of the proposed improved GLR algorithm for automatic translation error identification. The algorithm achieves an exceptional accuracy of 92.5%, showcasing its effectiveness in correctly identifying a substantial portion of translation errors within the dataset. This superior accuracy, when compared to baseline models, emphasizes the algorithm's proficiency in enhancing the overall precision of error detection. Furthermore, the precision of 0.93 indicates that the algorithm excels in accurately identifying errors among its predicted positives, demonstrating its ability to minimize false positives and ensure a high percentage of correct error identifications. The recall of 0.91 underscores the algorithm's capacity to capture a significant proportion of actual translation errors, emphasizing its robustness in avoiding false negatives. The balanced F1 score of 0.92 highlights the algorithm's ability to strike a harmonious trade-off between precision and recall, affirming its well-rounded performance. In terms of recognition speed, the improved GLR algorithm achieves an impressive 1200 words per second, demonstrating its efficiency in processing large volumes of text data in real-time translation scenarios. Additionally, the algorithm's high renewal capability indicates its adaptability to continuous learning and improvement, crucial for staying current with evolving language patterns and translation challenges. In summary, the findings underscore the improved GLR algorithm's prowess in accuracy, precision, recall, and recognition speed, positioning it as a promising advancement in the domain of automatic translation error identification. 5 Conclusions This research project focused on addressing the challenges of machine translation errors and aimed to enhance the quality of machine-translated English texts. By identifying and classifying translation faults, the proposed improved Generalized LR (GLR) algorithm, combined with machine learning techniques, offered a powerful and accurate solution for error detection. Through data collection and corpus annotation, various types of translation errors, including grammatical, lexical, collocation, semantic, and stylistic faults, were categorized. The modified GLR algorithm, enriched with linguistic and statistical elements from machine- translated texts, demonstrated its effectiveness in handling complex and ambiguous grammars, leading to improved error detection capabilities. Furthermore, the algorithm's high renewal capability ensures its adaptability to evolving translation challenges, allowing it to continuously improve and stay up-to-date with changing language patterns and requirements. Overall, this research contributes valuable methods for analysing and enhancing machine-translated English texts, significantly improving translation quality and contributing to the advancement of machine translation applications and domains. The combination of parsing, 90 Informatica 48 (2024) 81 –92 G. Li feature extraction, and machine learning techniques proves to be a powerful approach for precise and reliable error identification, enabling more effective cross- language communication and fostering better understanding among global communities. The findings of this study hold significant implications for the future development and utilization of machine translation technology, paving the way for enhanced language communication on a global scale. The current research has made significant strides in advancing machine translation error detection and improving the quality of machine-translated English texts, there are compelling avenues for future exploration. Firstly, expanding the adaptability of the improved GLR algorithm to a broader range of languages could enhance its versatility and effectiveness across diverse linguistic landscapes. Additionally, investigating the algorithm's application in real-time translation systems would provide crucial insights into its practical usability and responsiveness in dynamic language processing scenarios. Tailoring the algorithm to specific domains, such as legal, medical, or technical translation, represents another promising direction, allowing for a more nuanced understanding of its performance in specialized contexts. Considering the challenges posed by user-generated content, especially in informal communication channels like social media, and adapting the algorithm to handle informal language styles could further improve its applicability. A more detailed comparison with human translation error identification would offer nuanced insights into the algorithm's strengths and potential areas for improvement. Exploring mechanisms for continuous learning within the algorithm, integrating advanced Natural Language Processing techniques, and addressing ethical considerations related to biases in training data and societal impacts are crucial aspects that could shape the future trajectory of this research. By delving into these future directions, the study aims to contribute not only to the academic understanding of machine translation but also to its practical advancements and responsible deployment in real-world scenarios. References [1] Y. Sui. Computer Intelligent Proofreading Method for English Translation Based on Foreign Language Translation Model. In 2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture. p. 1121-1125, 2021. https://doi.org/10.1145/3495018.3495348 [2] J. Wang. Intelligent recognition model of English translation based on cloud computing GLR algorithm. The international conference on forthcoming networks and sustainability, Hybrid Conference, Nicosia, Cyprus. 2022. DOI: 10.1049/icp.2022.2408 [3] H. Wang and C. Zhao. English Long and Short Sentence Translation and Recognition Method Based on Deep GLR Model. Computational Intelligence and Neuroscience.2022, 2022. https://doi.org/10.1155/2022/3119477 [4] L.Deng, X. Hu and F. Liu. Intelligent Recognition Model of Business English Translation Based on Improved GLR Algorithm. Computational Intelligence and Neuroscience. 2022, 2022. https://doi.org/10.1155/2022/4105942 [5] L.Wang. Intelligent English Automatic Translation System Based on Improved GLR Algorithm. In 2023 IEEE International Conference on Control, Electronics and Computer Technology (ICCECT) 228: 1258-1262, 2023. https://doi.org/10.1016/j.procs.2023.11.061 [6] X.Yang. An Intelligent Recognition Model of English Translation Teaching Method Based on Improved GLR Algorithm. In 2022 International Symposium on Advances in Informatics, Electronics and Education (ISAIEE), Frankfurt, Germany. 626-630, 2022. DOI: 10.1109/ISAIEE57420.2022.00132 [7] Y. Guo and B. Lu. Design of foreign language intelligent translation recognition system based on improved GLR algorithm. The international conference of forthcoming networks, Hybrid Conference, Nicosia, Cyprus. 2022. DOI: 10.1049/icp.2022.2488 [8] L.Lei. Intelligent Recognition English Translation Model Based on Embedded Machine Learning and Improved GLR Algorithm. Mobile Information Systems. 2022, 2022. https://doi.org/10.1155/2022/5632131 [9] S. Zhang. English intelligent translation pattern recognition system on account of improved GLR algorithm. In The International Conference on Forthcoming Networks and Sustainability (FoNeS 2022). 2022:332-336, 2022. DOI: 10.1049/icp.2022.2447 [10] M.Deng and L. Yang. Intelligent Translation Recognition Model Supported by Improved GLR Algorithm. In International Conference on Multi- modal Information Analytics (pp. 472-479). Cham: Springer International Publishing, 2022. https://doi.org/10.1016/j.procs.2023.11.061 [11] I.Hwang, S. Kim, Y. Kim and C.E. Seah. A survey of fault detection, isolation, and reconfiguration methods. IEEE transactions on control systems technology, 18(3): 636-653, 2009. DOI: 10.1109/TCST.2009.2026285 [12] Y. Sui. Computer Intelligent Proofreading Method for English Translation Based on Foreign Language Translation Model. In 2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture. p. 1121-1125, 2021. https://doi.org/10.1145/3495018.3495348 [13] Y. Zhang, C. Zong and B. Xu. An Approach to Automatic Identification of Chinese Base Noun Phrases. In International Symposium on Chinese Spoken Language Processing, Hefei, China. 2022. DOI: 10.1109/ICCSE.2010.5593439 [14] J. Li, L. Stankovic, V. Stankovic, S. Pytharouli, C. Yang and Q. Shi. Graph-based feature weight optimisation and classification of continuous seismic Research on Automatic Identification of Machine English … Informatica 48 (2024) 81 –92 91 sensor array recordings. Sensors. 23(1): 243, 2022. doi: 10.3390/s23010243. [15] A. Degirmenci and O. Karal. Efficient density and cluster based incremental outlier detection in data streams. Information Sciences. 607:901-920, 2022. https://doi.org/10.1016/j.ins.2022.06.013 [16] N. S. Modjrian. Prediction of outdoor thermal comfort changes and uncovering mitigation strategies based on machine learning algorithm: a decision support tool for climate-sensitive design: a case study of Glasgow, UK, 2022. [17] X.Yang. An Intelligent Recognition Model of English Translation Teaching Method Based on Improved GLR Algorithm. In 2022 International Symposium on Advances in Informatics, Electronics and Education (ISAIEE), Frankfurt, Germany. pp. 626-630, 2022. DOI: 10.1109/ISAIEE57420.2022.00132 [18] Y. Liu. Design of English Intelligent Information Teaching System Based on Improved Glr Algorithm, 2022 International Conference on Knowledge Engineering and Communication Systems (ICKES), Chickballapur, India, 2022, pp. 1- 5, doi: 10.1109/ICKECS56523.2022.10060760 [19] D. Ji and W. Wang. Design of English Translation Software Based on Improved GLR Algorithm, 2023 International Conference on Networking, Informatics and Computing (ICNETIC), Palermo, Italy, 2023, pp. 655-659, doi: 10.1109/ICNETIC59568.2023.00140 [20] L. Pan. Design of Foreign Language Intelligent Translation Recognition System Based on Improved GLR Algorithm, 2022 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC), Dalian, China, 2022, pp. 1296-1299, doi: 10.1109/IPEC54454.2022.9777507. [21] J. Liu. Informatization of Constructive English Learning Platform Based on Improved GLR Algorithm, 2022 IEEE 2nd International Conference on Mobile Networks and Wireless Communications (ICMNWC), Tumkur, Karnataka, India, 2022, p. 1-4. doi: 10.1109/ICMNWC56175.2022.10031777 [22] K. J. Han and S. S. Narayanan. Novel inter-cluster distance measure combining GLR and ICR for improved agglomerative hierarchical speaker clustering. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV, USA, 2008, 4373-4376. doi: 10.1109/ICASSP.2008.4518624. doi: 10.1109/ICASSP.2008.4518624. [23] S Rui. Research on the Development of Computer Intelligent Proofreading System from the Perspective of English Translation Application [J]. Microcomputer Application36(322(02)):149-15, 2021. DOI: 10.1109/ICCEA50009.2020.00143 92 Informatica 48 (2024) 81 –92 G. Li