University of Ljubljana Press Založba Univerze v Ljubljani Ljubljana, 2025 ISSN 1581-8918 nglish anguage verseas erspectives and nquiries RETHINKING ENGLISH STUDIES THROUGH AI: CHALLENGES, ETHICS, AND INNOVATION Editors of ELOPE Vol. 22, No. 1: Tomaž ONIČ, David HAZEMALI and Mladen BOROVIČ Journal Editors: Smiljana KOMAR and Mojca KREVEL Vol. 22, No. 1 (2025) Editors of ELOPE Vol. 22, No. 1: Tomaž ONIČ, David HAZEMALI and Mladen BOROVIČ Journal Editors: Smiljana KOMAR and Mojca KREVEL University of Ljubljana Press Založba Univerze v Ljubljani Ljubljana, 2025 Vol. 22, No. 1 (2025) nglish anguage verseas erspectives and nquiries RETHINKING ENGLISH STUDIES THROUGH AI: CHALLENGES, ETHICS, AND INNOVATION CIP - Kataložni zapis o publikaciji Narodna in univerzitetna knjižnica, Ljubljana 811.111(082) RETHINKING English studies through AI : challenges, ethics, and innovation / editors Tomaž Onič, David Hazemali and Mladen Borovič. - Ljubljana : University of Ljubljana Press = Založba Univerze, 2025. - (ELOPE : English language overseas perspectives and enquiries, ISSN 1581-8918 ; vol. 22, no. 1) ISBN 978-961-297-608-8 COBISS.SI-ID 239835651 3 Contents PART I: INTRODUCTION Tomaž Onič, Mladen Borovič, David Hazemali 9 Rethinking English Studies Through AI: Challenges, Ethics, and Innovation PART II: LANGUAGE Tadej Todorović, Andrej Flogie, Daniel Hari 19 Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech Act Classification in Pinter’s The Birthday Party Generativna umetna inteligenca v pragmatiki: analiza natančnosti samodejne klasifikacije govornih dejanj v Pinterjevi drami Zabava za rojstni dan Agata Križan, Aja Barbič 35 Appraisal Analysis and AI Chatbots: Do We Even Need Humans? Analiza jezika vrednotenja in pogovorni sistemi: ali ljudi sploh potrebujemo? PART III: ACADEMIC WRITING Silvana Neshkovska 55 The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research Prednosti in tveganja pri znanstvenem pisanju s pomočjo umetne inteligence: spoznanja iz aktualnih raziskav Rashmika Lekamge, Clayton Smith 69 Impact of Auto-Correction Features in Text-Processing Software on the Academic Writing of ESL Learners Vpliv funkcije samodejnega popravljanja v programih za urejanje besedil na akademsko pisanje učencev in učenk angleščine kot drugega tujega jezika 4 Tommy Hastomo, Andini Septama Sari, Utami Widiati, Francisca Maria Ivone, Evynurul Laily Zen, Muhammad Fikri Nugraha Kholid 93 Does Student Engagement with Chatbots Enhance English Proficiency? Ali uporaba pogovornih sistemov prispeva k izboljšanju znanja angleščine pri študentih in študentkah? PART IV: ENGLISH LANGUAGE AND LITERATURE TEACHING Saša Jazbec, Bernarda Leva, Marta Licardo 113 AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers of English and German UI je prišla in bo ostala: empirična raziskava o stališčih učiteljev in učiteljic angleščine in nemščine Bojan Prosenjak, Eva Jakupčević 133 Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI in Classroom Settings Pogled osnovno- in srednješolskih učiteljev in učiteljic angleščine kot tujega jezika na Hrvaškem na uporabo UI pri pouku PART V: TRANSLATION STUDIES Nataša Gajšt 153 Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence – A Slovenian-English Perspective Uporaba orodij umetne inteligence pri prevajanju in sestavljanju poslovnih dopisov – slovensko-angleški vidik Simon Zupan, Zmago Pavličič, Melanija Larisa Fabčič 171 Machine Translation of Independent Nominal Phrases in Technical Texts Strojno prevajanje samostojnih samostalniških besednih zvez v tehničnih besedilih 5 Marija Brala Vukanović 185 Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for the EFL Classroom Prevajanje (metafor) v dobi umetne inteligence: priložnosti, izzivi in posledice za učilnico angleščine kot tujega jezika Ghodrat Hassani, Marziyeh Malekshahi, Hossein Davari 203 AI-Powered Transcreation in Global Marketing: Insights from Iran Transkreacija z umetno inteligenco v globalnem marketingu: spoznanja iz Irana LIST OF CONTRIBUTORS Part I Introduction 9INTRODUCTION Rethinking English Studies Through AI: Challenges, Ethics, and Innovation 1 Introduction Over a relatively brief period, the rapid development of artificial intelligence (AI) has reshaped our perception of traditional concepts that have been with us for decades, some even for centuries. Communication, language, academic writing, translation, and education have not escaped that transformation. Generative AI tools, particularly chatbots that use large language models (LLMs) to generate tailor-made texts, presentations, images, and videos, have entered the classroom, academic research, and various professional settings – often faster than our pedagogical frameworks and ethical standards can adapt. As these technologies grow more sophisticated and accessible, it has become impossible for the various fields of English studies – encompassing linguistics, writing, teaching, and translation – to ignore their presence and avoid their impact on all aspects of the fields. This thematic issue of ELOPE responds to this phenomenon. It brings together eleven original research articles that critically and creatively engage with the implications of AI for English language use, learning, and mediation. The volume addresses a range of issues and contexts: from pragmatic annotation in literary texts, to metaphor translation in the EFL classroom, and from ESL writing development, to teacher perceptions of AI tools. The contributions draw the reader’s attention to both the advantages and the pitfalls of integrating artificial intelligence into the various fields of English studies. The influence of AI on education is undeniably evident on many levels of teaching and learning. One of the salient aspects is its ability to individualize instruction: AI-driven platforms analyse each learner’s performance and recommend tailor-made exercises, readings, or feedback, with which – according to Msambwa, Wen and Daniel (2025); and Massaty, Fahrurozi and Budiyanto (2024) – we can sustain motivation and enhance progression. Another function of AI systems involves fostering critical present-day skills like computational thinking, or complex problem-solving, by using modern teaching or learning approaches such as just-in-time guidance or scaffolded challenges, i.e., problems structured to gradually increase in difficulty or complexity, with support (or scaffolding) provided along the way (Massaty, Fahrurozi, and Budiyanto 2024). Moreover, AI has expanded access to automated analysis and language support (Krishnan and Zaini 2025), which is not limited to English studies but interconnects with other disciplines. AI can also create collaborative learning environments by moderating group discussion, supporting peer-to-peer interaction, and converting static materials into adaptive simulations (Msambwa, Wen, and Daniel 2025; Orlanda-Ventayen 2024). Kusmiadi and Wahyudin (2024) also report that behind the scenes, administrative activities like grading, attendance monitoring, and early-alert systems are increasingly automated, supposedly making more time for the teaching staff to focus on innovative curriculum design and individualized mentorship. Tomaž Onič, Mladen Borovič, David Hazemali University of Maribor, Slovenia 10 Yet all these improvements open new, relevant considerations. Apart from the privacy and security issues raised by Yu et al. (2024) or Asad et al. (2024), which accompany the collection and analysis of student data, the use of AI opens the door to algorithmic biases that can skew recommendations, potentially privileging certain learners while marginalizing others (Cui and Alias 2024). Researchers also suggest caution in overreliance on AI, which can cause an eventual decrease in deeper cognitive engagement, as students elect to leave critical thinking to the machines (Butson and Spronken-Smith 2024; Castillo-Martínez et al. 2024). Ethical questions regarding authorship and academic integrity further complicate AI’s role in writing and assessment (Floridi 2023; Butler and Jiang 2025). These challenges are especially acute where limited digital infrastructure and low digital literacy might increase existing inequalities (Asad et al. 2024; Nguyen and Hoang 2025). In research contexts, AI accelerates the process by analysing large bodies of data, from historical archives to learner datasets, to identify patterns that are – owing to dataset size – potentially beyond human grasp (Cui and Alias 2024; Kusmiadi and Wahyudin 2024). In their study based on historical document analysis, Hazemali et al. (2024) demonstrated that AI excels at select surface-level processing and data extraction, but falters on tasks demanding interpretation, context sensitivity, or inference. Additionally, AI-assisted writing tools streamline drafting, editing, and literature synthesis, yet they require careful human oversight to maintain scholarly rigor and guard against “black-box” errors, suggested by Castillo-Martínez et al. (2024) and Ramirez and Esparrell (2024). These capabilities support new methodologies based on (big) data, such as adaptive experimental designs, large-scale sentiment analyses, and interdisciplinary collaborations (Jacques, Moss, and Garger 2024; Orlanda-Ventayen 2024). Yet they also open methodological and ethical questions: how can we assure replicability if algorithms continually develop and change? Who merits authorship credit for AI-(co-)authored output? To what extent must AI’s internal logic be disclosed, particularly when privacy or intellectual property are at stake (Butson and Spronken-Smith 2024; Yu et al. 2024)? As we can see from this review of recent educational and research development, there exists an urgent need for comprehensive ethical and policy frameworks. Institutions must balance AI-mediated automation with rigorous human oversight to protect privacy and academic integrity (Floridi 2023; Ali et al. 2024; Yu et al. 2024), while at the same time, they must promote training in digital literacy and ensure that the benefits of AI are not limited to small groups of learners and researchers but are accessible to all (Kusmiadi and Wahyudin 2024; Yu et al. 2024), which is one of the crucial tasks of the humanities in the digital world. 2 Overview of the Studies The articles in this issue are grouped into four thematic clusters – Language, Academic Writing, English Language and Literature Teaching, and Translation Studies – each addressing a particular aspect of AI and its growing role in our work. The boundaries between disciplines are, of course, neither strict nor hermetically detached from other fields, since the issues often venture into interdisciplinary areas. The present volume offers an insight how scholars, educators, and practitioners can engage with AI not merely as a tool, but as a stimulus for rethinking core assumptions and professional practices in English studies. T. Onič, M. Borovič, D. Hazemali Rethinking English Studies Through AI: Challenges, Ethics, and Innovation T. Onič, M. Borovič, D. Hazemali 11INTRODUCTION 2.1 Language: AI as a Tool for Language Analysis The first two articles investigate the application of generative AI in linguistic analysis. The opening study by Tadej Todorović, Andrej Flogie and Daniel Hari tests ChatGPT, Gemini, and DeepSeek for speech act classification in Harold Pinter’s The Birthday Party. With an accuracy of 82% under optimized conditions, the results affirm AI’s potential for supporting discourse annotation – particularly when prompts are paired with theoretical grounds, a practice increasingly advocated in AI-assisted humanities research (Lozić and Štular 2023). The second article by Agata Križan  and Aja Barbič applies Martin and White’s appraisal framework to AI-generated analysis of evaluative language. The coding results done by ChatGPT and Microsoft Copilot were compared and then supervised by human analysts, revealing an encouraging overlap in basic categorization but a lack of nuance in AI-generated responses. This reflects a recurring challenge in AI-driven textual analysis: the tendency to prioritize formal correctness over content accuracy or critical precision (Gonzalez Garcia and Weilbach 2023). 2.2 Academic Writing: Supporting Writing with AI Three articles address AI’s impact on student writing and engagement. In the first one, the author Silvana Neshkovska reviews literature on ChatGPT’s role in academic writing. While highlighting benefits in autonomy and motivation, the study warns against the ethical pitfalls of AI overuse. The blurred lines between assistance and authorship remain a pressing concern, particularly in educational contexts where writing is also a process of knowledge construction (Altmäe, Sola-Leyva, and Salumets 2023; Abadie, Chowdhury, and Mangla 2024; Asad et al. 2024). The second article in this section by Rashmika Lekamge and Clayton Smith explores how learners of English as a Second Language (ESL) interact with auto-correction tools like the one provided in Microsoft Word. While the tools reduced surface-level errors, extended reliance on the tool led to lower self-editing skills and writing confidence – a dynamic mirrored in recent AI-based writing support tools (Kasneci et al. 2023; Kohnke, Zou, and Su 2025). In a study of Indonesian university students, the authors Tommy Hastomo, Andini Septama Sari, Utami Widiati, Francisca Maria Ivone, Evynurul Laily Zen, and Muhammad Fikri Nugraha Kholid show that chatbot engagement, particularly behavioural and cognitive, correlates with improved English proficiency. This confirms emerging research suggesting that AI tools can support language acquisition and enhance vocabulary, grammar, and writing fluency if engagement is active, reflective, and task-focused (Ali et al. 2024; Krishnan, and Zaini 2025). 2.3 English Language and Literature Teaching: Teacher Attitudes, Competence, and Professional Development In this section, two studies explore how language educators respond to AI in the classroom. A survey conducted by Saša Jazbec, Bernarda Leva and Marta Licardo among Slovenian teachers 12 T. Onič, M. Borovič, D. Hazemali Rethinking English Studies Through AI: Challenges, Ethics, and Innovation finds that while AI is mostly not viewed as a threat, it is seen as a disruptor – requiring shifts in instructional design and professional identity. This echoes recent concerns about the social and psychological effects of AI in education (Suchithra and Arya 2025; Kasneci et al. 2023) and is consistent with Krishnan and Zaini’s (2025) conclusion that AI’s potential can be realized only when educators are well-trained and supported in its use. Croatian EFL teachers Bojan Prosenjak  and Eva Jakupčević likewise reveal mixed levels of digital competence. Professional development is therefore essential – not only for skill- building but for helping educators and pre-service teachers form balanced, critical views of AI. This same goal is reinforced by Butler and Jiang (2025), who found that less confident users of ChatGPT were more likely to accept its output uncritically. 2.4 Translation Studies: Exploring AI’s Role in Language Mediation The final section, containing four articles, examines translation issues and practices in the new context of AI presence. In the first of the four articles, Nataša Gajšt examines business correspondence translated with the help of ChatGPT, Claude, and Gemini. The author concludes that while the output was mostly usable, inconsistencies in tone and register demonstrate the need for human editorial judgment – a finding echoed in other recent research not specifically in the area of translation (e.g., Hazemali et al. 2024). Another study by Simon Zupan, Zmago Pavličič and Melanija Larisa Fabčič explores machine translation of nominal phrases in technical texts. With nearly half the phrases mistranslated, the study exposes the limits of current LLMs in high-density, context-dependent language – a familiar challenge in AI language models that, according to Boros et al. (2024), still struggle with specialized corpora. Unsurprisingly, metaphor translation presents another difficulty that AI cannot yet successfully address or resolve. While students in the experiment appreciated using AI tools, errors in figurative language revealed their limitations. The author Marija Brala Vukanović, however, argues that these inaccuracies can be turned into didactic benefits under the guidance of a skilled teacher. The section closes with a study on AI-powered transcreation in cross-cultural marketing. Surprisingly, the authors Ghodrat Hassani, Marziyeh Malekshahi and Hossein Davari find that trained students outperformed professionals after using ChatGPT tools, which underlines the importance of quality prompt engineering and guided learning for an optimal outcome. According to Gonzalez Garcia and Weilbach (2023), this is particularly relevant in domains where cultural resonance is as crucial as linguistic accuracy. 3 Conclusion The contributions to this special issue collectively show that artificial intelligence is no longer a peripheral novelty but a pertinent phenomenon that has already won a visible position in English studies. We can expect its relevance and status to grow stronger and more central in the future, regardless of the discipline or subfield of English studies, which is reflected in these 13INTRODUCTION articles that offer both a critical and constructive account of AI’s growing influence. Apart from this general understanding, the studies reach another shared conclusion, which is that AI tools are only as effective and ethical as the human users who operate them, and as they do so they rely on their own expertise and ethics. It is therefore crucial to strive for thoughtful and responsible integration of AI in academic and professional work. This issue of ELOPE does not seek to offer final answers but rather to open new questions and inquiries. Teachers, researchers, translators, and others who deal with English studies are uniquely positioned to shape the newly emerging relationship between language and technology. The questions raised here – about accuracy, agency, pedagogy, and professional roles – will continue to define our fields in the years ahead. It is our hope that this collection provides a valuable foundation for those navigating, critiquing, and contributing to the future of AI in English language studies. References Abadie, Amelie, Soumyadeb Chowdhury, and Sachin Kumar Mangla. 2024. “A shared journey: Experiential perspective and empirical evidence of virtual social robot ChatGPT’s priori acceptance.” Technological Forecasting and Social Change 201: 123202. https://doi.org/10.1016/j.techfore.2023.123202. Ali, Omar, Peter A. Murray, Mujtaba Momin, Yogesh K. Dwivedi, and Tegwen Malik. 2024. “The effects of artificial intelligence applications in educational settings: Challenges and strategies.” Technological Forecasting and Social Change 199: 123076. https://doi.org/10.1016/j.techfore.2023.123076. Altmäe, Signe, Antonio Sola-Leyva, and Andres Salumets. 2023. “Artificial intelligence in scientific writing: A friend or a foe?” Reproductive BioMedicine Online 47 (1): 3–9. https://doi.org/10.1016/j.rbmo.2023.04 .009. Asad, Muhammad Mujtaba, Shafaque Shahzad, Syed Hassan Ali Shah, Fahad Sherwani, and Norah Mansour Almusharraf. 2024. “ChatGPT as artificial intelligence-based generative multimedia for English writing pedagogy: Challenges and opportunities from an educator’s perspective.” International Journal of Information and Learning Technology 41 (5): 490–506. https://doi.org/10.1108/ijilt-02-2024-0021. Boros, Emanuela, Maud Ehrmann, Matteo Romanello, Sven Najem-Meyer, and Frédéric Kaplan. 2024. “Post-correction of historical text transcripts with large language models: An exploratory study.” In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), St. Julians, Malta, 133–59. Association for Computational Linguistics. https://aclanthology.org/2024.latechclfl-1.14.pdf. Butler, Yuko G., and Shiyu Jiang. 2025. “How do pre-service language teachers perceive generative AIs’ affordance?: A case of ChatGPT.” System 129: 103606. https://doi.org/10.1016/j.system.2025.103606. Butson, Russell, and Rachel Spronken-Smith. 2024. “AI and its implications for research in higher education: A critical dialogue.” Higher Education Research & Development 43 (3): 563–77. https://doi.org/10.1080/0729 4360.2023.2280200. Castillo-Martínez, Isolda Margarita, Daniel Flores-Bueno, Sonia M. Gómez-Puente, and Victor O. Vite- León. 2024. “AI in higher education: A systematic literature review.” Frontiers in Education 9: 1391485. https:// doi.org/10.3389/feduc.2024.1391485. Cui, Pengfei, and Bity Salwana Alias. 2024. “Opportunities and challenges in higher education arising from AI: A systematic literature review (2020–2024).” Journal of Infrastructure, Policy and Development 8 (11): 8390. https://doi.org/10.24294/jipd.v8i11.8390. 14 T. Onič, M. Borovič, D. Hazemali Rethinking English Studies Through AI: Challenges, Ethics, and Innovation Dwivedi, Yogesh K., Laurie Hughes, Elvira Ismagilova, Gert Aarts, Crispin Coombs, Tom Crick, et al. 2021. “Artificial Intelligence (AI): Multidisciplinary Perspectives on Emerging Challenges, Opportunities, and Agenda for Research, Practice and Policy.” International Journal of Information Management 57: 101994. https://doi.org/10.1016/j.ijinfomgt.2019.08.002. Floridi, Luciano. 2023. The Ethics of Artificial Intelligence. Principles, Challenges, and Opportunities. Oxford University Press. Gonzalez Garcia, Giselle, and Christian Weilbach. 2023. “If the sources could talk: Evaluating large language models for research assistance in history.” In CHR 2023: Computational Humanities Research Conference, December 6–8, 2023, Paris, France. https://doi.org/10.48550/arXiv.2310.10808. Hazemali, David, Janez Osojnik, Tomaž Onič, Tadej Todorović, and Mladen Borovič. 2024. “Evaluating chatbot assistance in historical document analysis.” Moderna arhivistika 7 (2): 53–83. https://doi.org/10.54 356/ma/2024/biub3010. Jacques, Paul H., Hollye K. Moss, and John Garger. 2024. “A synthesis of AI in higher education: Shaping the future.” Journal of Behavioral and Applied Management 24 (2): 103–11. https://doi.org/10.21818/001c.12 2146. Kasneci, Enkelejda, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, Stephan Krusche, Gitta Kutyniok, Tilman Michaeli, Claudia Nerdel, Jürgen Pfeffer, Oleksandra Poquet, Michael Sailer, Albrecht Schmidt, Tina Seidel, Matthias Stadler, Jochen Weller, Jochen Kuhn, and Gjergji Kasneci. 2023. “ChatGPT for good? On opportunities and challenges of large language models for education.” Learning and Individual Differences 103: 102274. https://doi.org/10.1016/j.lindif.2023.102274. Kohnke, Lucas, Di Zou, and Fan Su. 2025. “Exploring the potential of GenAI for personalised English teaching: Learners’ experiences and perceptions.” Computers and Education: Artificial Intelligence 8: 100371. https://doi.org/10.1016/j.caeai.2025.100371. Krishnan, Vekneswary, and Hafiz Zaini. 2025. “A systematic literature review on artificial intelligence in English language education.” International Journal of Research and Innovation in Social Science 9 (1): 82–88. https:// doi.org/10.47772/IJRISS.2025.903SEDU0002. Kusmiadi, Kusmiadi, and Didin Wahyudin. 2024. “The role of artificial intelligence (AI) software in education and research: A systematic literature review.” Journal of Vocational Education Studies 7 (2): 191–208. https:// doi.org/10.12928/joves.v7i2.10387. Lozić, Edisa, and Benjamin Štular. 2023. “Fluent but not factual: A comparative analysis of ChatGPT and other AI chatbots’ proficiency and originality in scientific writing for humanities.” Future Internet 15 (10): 336. https://doi.org/10.3390/fi15100336. Massaty, Muhammad Hassan, Slamet Kurniawan Fahrurozi, and Cucuk Wawan Budiyanto. 2024. “The role of AI in fostering computational thinking and self-efficacy in educational settings: A systematic review.” Indonesian Journal of Informatics Education 8 (1): 52–64. https://doi.org/10.20961/ijie.v8i1.89596. Msambwa, Msafiri Mgambi, Zhang Wen, and Daniel Kangwa. 2025, “The impact of AI on the personal and collaborative learning environments in higher education.” European Journal of Education 60: e12909. https://doi.org/10.1111/ejed.12909. Nguyen, Thanh Huyen, and Thi Ngoc Hien Hoang. 2025. “Investigating the promises and perils of generative AI in EFL learning in higher education: A literature review.” AsiaCALL Online Journal 16 (1): 1–25. https://doi.org/10.54855/acoj.251611. Orlanda-Ventayen, Caren Casama. 2024. “Empowering education through transformative role of artificial intelligence (AI) in teaching and learning: Educators’ perspective and research trends.” In 9th International Conference on Information Technology and Digital Applications (ICITDA), Nilai, Negeri Sembilan, Malaysia, 1–5. IEEE. https://doi.org/10.1109/ICITDA64560.2024.10809596. Ramirez, Elkin Arturo Betancourt, and Juan Antonio Fuentes Esparrell. 2024. “Artificial intelligence (AI) in education: Unlocking the perfect synergy for learning.” Educational Process International Journal 13 (1): 35–51. https://doi.org/10.22521/edupij.2024.131.3. 15INTRODUCTION Suchithra, V. G., and C. S. Arya. 2025. “The study on ethics and biases in AI-powered education.” European Journal of Contemporary Education and E-Learning 3 (2): 37–43. https://doi.org/10.59324/ejceel.2025.3( 2).04. Yu, Ji Hyun, Devraj Chauhan, Rubaiyat Asif Iqbal, and Eugene Yeoh. 2024. “Mapping academic perspectives on AI in education: Trends, challenges, and sentiments in educational research (2018– 2024).” Educational Technology Research and Development 73: 199–227. https://doi.org/10.1007/s11423-024-10425-2. Part II Language 19LANGUAGE 2025, Vol. 22 (1), 19-34(228) journals.uni-lj.si/elope https://doi.org/10.4312/elope.22.1.19-34 UDC: [81’33:821.111.09-2]:004.89 Tadej Todorović, Andrej Flogie, Daniel Hari University of Maribor, Slovenia Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech Act Classification in Pinter’s The Birthday Party ABSTRACT This study explores the feasibility of using generative AI (ChatGPT, Gemini, and DeepSeek) to automate speech act annotation in Harold Pinter’s play The Birthday Party. Three chatbots – ChatGPT, Gemini, and DeepSeek – were tested under three scenarios varying in the amount of theoretical material provided. Each chatbot’s output was compared to a manually annotated reference via a Python script measuring classification accuracy. Scenario 2 produced the highest accuracy overall (75–82%), while Scenario 1 underperformed, owing to incorrect reliance on external typologies, and Scenario 3 showed signs of overfitting. ChatGPT o1 emerged as the most accurate model, achieving 82% accuracy in Scenario 2. The findings suggest that GenAI chatbots can serve as valuable preliminary annotators when good prompt-engineering and well-curated theoretical material are provided. Future research could extend this methodology to more context-dependent texts, further refining prompt- engineering strategies and exploring larger linguistic corpora. Keywords: pragmatics, speech act analysis, ChatGPT, DeepSeek, Gemini, Pinter Generativna umetna inteligenca v pragmatiki: analiza natančnosti samodejne klasifikacije govornih dejanj v Pinterjevi drami Zabava za rojstni dan IZVLEČEK Študija raziskuje smiselnost rabe generativne umetne inteligence (ChatGPT, Gemini in DeepSeek) za avtomatizacijo anotacije govornih dejanj v Pinterjevi drami Zabava za rojstni dan. Trije klepetalni roboti – ChatGPT, Gemini in DeepSeek – so bili testirani v treh scenarijih, ki so se razlikovali glede na obseg predloženega teoretičnega gradiva. Rezultati vsakega klepetalnega robota so bili primerjani z ročno anotirano različico s pomočjo Python skripte, ki je izmerila natančnost klasifikacije. Scenarij 2 je na splošno dosegel najvišjo natančnost (75–82 %), medtem ko je bil scenarij 1 zaradi neustreznega zanašanja na tuje tipologije preslab, scenarij 3 pa je kazal znake preprileganja (angl. overfitting). ChatGPT o1 se je izkazal za najnatančnejši model, saj je v scenariju 2 dosegel 82-odstotno zanesljivost. Ugotovitve kažejo, da lahko klepetalni roboti GEN-UI služijo kot koristni predhodni anotatorji, če so na voljo dobro zasnovani pozivi in dobro pripravljeno teoretično gradivo. Prihodnje raziskave bi lahko to metodologijo razširile na besedila, ki so bolj odvisna od konteksta, nadalje izpopolnile strategije inženiringa pozivov in raziskale večje jezikovne korpuse. Ključne besede: pragmatika, analiza govornih dejanj, ChatGPT, DeepSeek, Gemini, Pinter 20 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ... 1 Introduction In this paper, we examine the potential of generative artificial intelligence (GenAI) for assisting and optimizing research in pragmatics. Specifically, we focus on research using speech act analysis, a powerful tool that enables quantitative and qualitative insight into various pragmatic topics of interest, mediation (Kádár et al. 2024; House et al. 2024), small talk (House and Kádár 2023), bargaining (Liu, House, and Kádár 2024), etc. Such research can yield unique and robust results in pragmatics; however, the initial data collection process is often time- consuming as it requires the researchers to manually annotate the data using some form of speech act typology. A tool such as GenAI that could either perform or, at the very least, facilitate this initial process would thus be especially beneficial for researchers in this field. We thus performed a case study, testing the potential of selected GenAI tools (ChatGPT, Gemini, and DeepSeek) for annotating Harold Pinter’s early play The Birthday Party (Pinter 1991) using a finite speech act typology developed by Edmondson, House, and Kádár (Edmondson and House 1981; Edmondson, House, and Kádár 2023). We decided to test the chatbots’ capabilities using a literary work because historical documents, interviews, or other recordings usually require additional context for successful annotation, whereas a literary work is as close to a self-contained whole as possible. Next to identifying the most appropriate chatbot for the task, we also undertook the task of determining the best prompt (prompt-engineering) that researchers could use for this work. In doing so, we developed three scenarios for testing each chatbot: (1) instructing the chatbot to research the speech act typology online, providing it with a short annotated excerpt from The Birthday Party, and then instructing it to annotate the remainder of Act one; (2) providing the chatbot with a short description of the speech act typology (20 pages) and with a short annotated excerpt from The Birthday Party, and then instructing it to annotate the remainder of Act one; and (3) providing the chatbot with an exhaustive description of the speech act typology (80 pages) and with a short annotated excerpt from The Birthday Party, and then instructing it to annotate the remainder of Act one. Finally, we analysed the results by comparing the automatic annotations to the version of The Birthday Party manually annotated by human experts. In doing so, we endeavoured to answer the following research questions: 1. How successful are chatbots in providing an automatically annotated text in line with the given speech act typology? 2. Which scenario yields the best outcome (the highest fidelity to manual annotation)? 3. Are chatbots useful for performing such preliminary annotations or, at the very least, facilitating this process? 2 Related Work GenAI chatbots have been recognized as useful in many domains, including time-consuming tasks such as literature reviews, citation management, proofreading, summarizing, paraphrasing, etc. (Stokel-Walker 2023; Else 2023; Altmäe, Sola-Leyva, and Salumets 2023). They have been tested in various academic fields, such as machine translation of literary works (Mohar, Orthaber, and Onič 2020) and assistance in analysing historical documents (Hazemali Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ... 21LANGUAGE et al. 2024) with various degrees of success. In the field of pragmatics, there are several studies that examine chatbots themselves and their utterances, such as exploring Gricean Maxims to help inform the basic design of effective conversational interaction (Setlur and Tory 2022), using AI-generated conversations as human-like data for pragmatic analysis (Chen, Li, and Ye 2024), and investigating the politeness strategies of chatbots (Monteiro, Pereira, and Salgado 2023). In the field of speech act analysis, a recent study has examined whether chatbots are capable of assertion (Williams and Bayne 2024). Most of these studies focus on studying GenAI and generating new knowledge by examining their behaviour, which we believe to be a worthy endeavour; however, chatbots can also be useful as a facilitator of research in sometimes painstakingly slow processes, such as annotating large volumes of text, utterance by utterance, using a specific speech act typology. To our knowledge, no one has yet tried to use the capabilities of chatbots to aid in the analysis of such data. 3 Data and Methodology 3.1 Data Our data includes a manual annotation of Harold Pinter’s The Birthday Party (Pinter 1991), one of Pinter’s most frequently performed works, with some critics ranking it among the greatest dramatic achievements of British theatre (Hribar 2004; Gavez 2016; Onič 2016). The play follows the events unfolding in a boarding house in an English seaside town, run by Meg and Petey Boles. It begins with an ostensibly mundane breakfast conversation between Meg and Petey and eventually transforms into a psychological play where two strangers, Goldberg and McCann, arrive at the boarding house, searching for Stanley, one of the “permanent” guests at the boarding house. We chose this play for three reasons. First, we wanted to analyse chatbots’ capabilities in analysing a literary work, which, compared to historical documents or diplomatic transcripts, is as close to a self-contained whole as possible. A speech act annotation of historical documents requires additional outside context, i.e., knowledge of the complex political situation during which the analysed discourse took place, so the researchers can attribute the correct speech acts to participants based on both their statements and their motivations in the context of the political situation (insofar as this information is known). For example, in the case of a speech act annotation of a mediation event between the EEC and the Slovenian and Croatian states, the authors first had to be familiarized with the political context that surrounded that mediation attempt (Kádár et al. 2024). Second, among the various types of literary works, plays are most appropriate for speech act annotation and subsequent analysis, as they feature (almost exclusively) direct speech, whereas other forms of literature, like novels or short stories, also include the narrative voice, which cannot be analysed in this way. Moreover, the resemblance to an ordinary everyday conversation that contains elements of naturally occurring discourse, such as hesitations, repetitions, self-corrections, or non-sequiturs, is highest in contemporary drama (Podbevšek and Žavbi 2021; Onič and Prajnč Kacijan 2020), as opposed to, for example, the language of Elizabethan drama, which is highly poetic. Third, The Birthday Party is one of the most exemplary absurdist plays; additionally, Pinter’s use of dialogue is characterized as “standard English, but the conversation doesn’t get anywhere” 22 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ... (Schechner 1966, 176): i.e., Pinter’s dramatic dialogue adheres to certain established everyday paradigms, typically English small-talk clichés, in which certain regularities apply which, whether written or unwritten, are quite firmly rooted in the English tradition (Onič 2016). This makes Pinter’s plays and The Birthday Party, in particular, ideal material for a pragmatic analysis that utilizes this speech act typology. 3.2 Methodology The purpose of the paper is to establish which AI tool is best suited for analysing and annotating a text, in our case a play, using a finite speech act typology developed by Edmondson, House, and Kádár (Edmondson and House 1981; Edmondson, House, and Kádár 2023). We chose this speech act typology because, compared to other speech act typologies, it is finite, which prevents the invention of new speech acts. This allows for comparison of different texts and ensures replicability of the research thus produced (Kádár et al. 2024). Furthermore, the typology has been widely and successfully used in pragmatics, as evidenced by the influential works of various authors in the field (House 1996; Edmondson, House, and Kádár 2023; Taguchi and Kádár 2025). To determine the best tool for the job, we compare the results (the annotated play) of three AI tools available on the market to a manual annotation of The Birthday Party. The AI tool producing an AI-generated version with the least discrepancies compared to the manual annotation will be considered the most appropriate; our methodology is thus fundamentally contrastive. For the annotation, we chose the following AI tools: ChatGPT (OpenAI), Gemini (Google), and DeepSeek. While both ChatGPT and Gemini use a transformer- based architecture, they nevertheless utilize different training data: ChatGPT uses a massive dataset of both text and human-annotated examples, whereas Gemini uses a proprietary dataset, curated by Google. Additionally, the two have different strengths: advanced language understanding in Gemini’s case and exceptional conversational ability in the case of ChatGPT (Rane, Choudhary, and Rane 2024). Both language understanding and conversational ability are variables relevant to pragmatics and might explain the differences in the final output. The final tool, DeepSeek, was added because of its lower cost of development and usage compared to ChatGPT and Gemini (DeepSeek-AI et al. 2025), even though it retains their capabilities and uses a new architecture – a more collective approach that uses a mixture of specialized neural networks that work in conjunction and not a massive, unified AI system (Moors 2025). Furthermore, since DeepSeek is open-source and relatively easier to run because of its lower resource usage, it would be much easier to run it locally and thus avoid various privacy and security concerns related to using generative AI tools. All the selected tools have different context windows: ChatGPT supports contextual lengths of up to 128000 tokens, Gemini up to one million tokens, and DeepSeek up to 163840 tokens. This allows for the processing of very long documents or complex conversations while preserving the full context, which is ideal for our study and, if our study is successful, for analysis of entire corpora of texts using this approach (e.g., hundreds of absurdist plays). To simultaneously determine the best procedure (prompt engineering) for producing optimal output (an annotation of The Birthday Party that deviates the least from the manual 23LANGUAGE annotation), we tested the three tools in three different scenarios. In the first scenario (1), we instructed the chatbot to research the specific speech act typology online, then we uploaded a sample of manual annotation (approximately 150 lines) for the chatbot to analyse, and we finally instructed the chatbot to produce its own annotation of the remainder of Act one – the play was provided in .docx format, with each utterance in the play numbered and on a new line; the manual annotation was annotated in the same manner. In the second scenario (2), we did not instruct the chatbot to research the typology but instead provided a short description of the typology (approximately 20 pages) and speech acts from a referential work (Edmondson, House, and Kádár 2023) alongside the same sample of manual annotation that we used in the first scenario. The remaining instructions for the chatbot were the same: to produce its own annotation of the remainder of Act one. The final, third scenario (3), differed from the second in that we uploaded a much more exhaustive and comprehensive description of the speech act typology (approximately 80 pages) from the same source (Edmondson, House, and Kádár 2023), while the other steps remained the same as in the second scenario. We wanted to utilize the best available iterations of GenAI in use; however, different iterations have different capabilities: for instance, some allow uploading of texts and files, and some do not; some can research content online, while others cannot. To address such discrepancies, we had to slightly modify our prompts for each specific GenAI model used. Considering the rapid development of new iterations, we find it extremely relevant to mention the specific iterations used in our study and their capabilities at this time. In testing ChatGPT, we intended to utilize the o1 iteration, which generates longer trains of thought before providing an answer (Wang et al. 2024). We decided against using ChatGPT o3, based on various technical and deployment-related parameters. Although ChatGPT o3 is a newer model, it is currently a smaller, more latency- and cost-optimized “mini” version, which makes it less appropriate for complex linguistic and pragmatic tasks. In recent comparative studies (Raffel et al. 2023), such smaller and/or more resource-efficient models have yielded lower performance in deeper discourse understanding and reduced capacity for long-term context retention in comparison to larger models like o1 or GPT 4o. Furthermore, ChatGPT o1 supports a larger number of parameters dedicated to more advanced forms of reasoning and linguistic understanding (Wang et al. 2024), which we believe is crucial for tasks such as speech act classification. However, ChatGPT o1 cannot currently research topics online, so we could not use it for Scenario 1. Instead, we used ChatGPT 4o for Scenario 1 and used the more powerful ChatGPT o1 for Scenario 2 and Scenario 3 (this is mentioned in the results). Nor can ChatGPT o1 read PDFs or documents, so the materials (manual annotation and the long and short theory) were provided in the prompt itself. In testing Gemini, we intended to utilize the 2.0 PRO Experimental iteration, yet, similar to ChatGPT, different iterations offer different capabilities. We chose the Gemini 2.0 PRO Experimental because, according to Google’s own documentation and independent analyses (Gemini Team et al. 2024; Chowdhery et al. 2022), it produces more advanced results in tasks related to logical reasoning and extended context retention – both of which are crucial for speech act analysis. However, because Gemini 2.0 PRO Experimental does not 24 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ... currently support online research, we utilized Gemini 2.0 Flash for Scenario 1. In terms of technical specifications and reasoning capabilities, Gemini 2.0 PRO Experimental is also the closest model to ChatGPT o1, making a direct comparison of their outcomes the most methodologically sound approach. This enabled us to rule out effects stemming from significant differences in architecture or dataset size and focus instead on the models’ actual ability to classify speech acts. However, it has the same limitations as ChatGPT o1 – it cannot research topics online or read PDFs or documents. Similarly to prompt-engineering for ChatGPT, we adopted the prompts for Scenario 2 and Scenario 2 for Gemini 2.0 PRO Experimental by providing the materials in the prompt itself. For Scenario 1, we used Gemini 2.0 Flash, which can research topics online and read documents, so the prompt was not further adapted. In testing DeepSeek, we utilized the DeepSeek-R1 iteration, which is an open-source model that enables both researching online, and the upload of PDFs and DOCX files. or documents (DeepSeek-AI et al. 2025; Mercer, Spillard, and Martin 2025). Furthermore, DeepSeek- R1is, compared to ChatGPT o1 and Gemini 2.0 PRO Experimental, free to use and requires fewer resources to operate. Like ChatGPT o1 and Gemini 2.0 PRO Experimental, it offers enhanced reasoning capabilities and can process longer texts, making a direct comparison among the three models (ChatGPT o1, Gemini 2.0 PRO Experimental, and DeepSeek-R1) fully justified. Another advantage of DeepSeek-R1 is its open-source nature and relatively low computational demands, allowing for simpler local deployment and thus direct protection of sensitive data (Mercer, Spillard, and Martin 2025). According to published benchmarks (DeepSeek-AI et al. 2025), DeepSeek-R1 achieves statistically similar results to closed-source solutions on comparable text-intensive tasks, i.e., it should deliver at least an equivalent level of accuracy for speech act classification, while being free to use and requiring fewer resources to operate compared to ChatGPT o1 and Gemini 2.0 PRO Experimental. For the quantitative comparison between the manually annotated classification (reference text) and the AI-generated classifications, we developed a Python script that automatically performs the following: • Reads .docx files containing both the reference classification and the AI-generated classifications. • Compares the corresponding speech act for each line/utterance in the dialogue. • Measures the similarity between sentences (using SequenceMatcher and the Hungarian algorithm1 for optimal line alignment). • Aligns the reference and the AI-predicted speech act types and calculates the number of mismatches. • Calculates the classification accuracy, i.e., the percentage of lines that were annotated correctly compared to the reference. 1 Also known as the Kuhn-Munkres algorithm (Kuhn 1955) – this is a classic algorithm for solving the assignment problem in combinatorial optimization, where the best match between elements of two sets is sought to maximize the total similarity or minimize the total distance. 25LANGUAGE 4 Results and Discussion We have decided to segment the results into two sections. In the first part, we will present statistical data for all three scenarios, analysing the overall success rate of all three GenAI in this task and the general trends. In the second part, we will focus on a more detailed analysis of the mistakes and the general trends of those mistakes in the most successful chatbot, with specific examples from each speech act category. Unsurprisingly, the chatbots were least successful in Scenario 1, where they were instructed to conduct online research on the speech act typology and, with the help of the manually annotated example, annotate the text. Our main concern was that the chatbots would not be able to differentiate between the various speech act typologies online and choose the one we prescribed. We hoped that providing the manually annotated example would ground the chatbots and steer them to the correct speech act typology; unfortunately, this was not the case. Both ChatGPT 4o and Gemini 2.0 Flash utilized speech acts that were outside the prescribed typology, with ChatGPT classifying 537 utterances and Gemini 2.0 Flash classifying fifty-six utterances with categories outside the classification. Surprisingly, DeepSeek-R1 managed to utilize the correct typology, yet its accuracy was still only 29%. We believe this result can be explained by the fact that, for Scenario 1, we were forced to use fewer capable iterations of ChatGPT and Gemini (ChatGPT 4o instead of ChatGPT o1 and Gemini 2.0 Flash instead of Gemini 2.0 PRO Experimental) because of prompt limitations: the more capable models do not yet have online research enabled. We conjecture that using the more capable models would improve usage of the correct typology, considering the similarities in the results in other scenarios. That being said, the results of Scenario 1 indicate a clear winner, which was Gemini 2.0 Flash. Despite using a less capable model and despite assigning fifty-six utterances to speech act categories outside the prescribed technology, Gemini 2.0 Flash achieved an accuracy of 63%, which is more than twice as good as the other two chatbots, as shown in the table below. Table 1. Accuracy of Chatbots in Scenario 1 (autonomous online research + manually annotated example). Total speech acts Correct classifications Mismatched classifications Accuracy (%) ChatGPT 4o 762 201 561 26% Gemini 2.0 Flash 762 480 282 63% DeepSeek-R1 762 221 541 29% Furthermore, detailed examination of the results for Gemini 2.0 Flash in Scenario 1 indicates that it might have performed even better. It misclassified thirty-three instances of speech act Request as Command. Commands are not in our speech act typology, yet they do belong under Request, so it could be argued that Gemini 2.0 Flash still correctly recognized the pragmatic intent behind these misclassifications. On the other hand, it misclassified relatively “easy” categories: for example, it misclassified all four instances of Leave-Take, which is a ritualistic speech act that signifies the termination of an encounter between two speakers. 26 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ... This is usually performed via tokens such as “Good night,” “Bye,” “See you,” “Cheerio,” etc. Nevertheless, Gemini 2.0 Flash classified three Leave-takes, “Ta-ta, Mrs. Boles,” Ta-ta,” and “Ta-ta, Stan,” as a Greet, a speech act utilized for acknowledging the presence of the interlocutor (“Hello,” “Hi,” etc.), and one Leave-take, “See you later,” as a Resolve (illocution used to express the speaker’s actions). Furthermore, it classified an instance of How-are-you (“How are you keeping, Mrs Boles?”), another Ritualistic speech as, as a Request. When using chatbots to facilitate the analysis of researchers, we would require them to at least identify the “easy” ritualistic speech acts, such as Leave-take and How-are-you, which are codified only by a few almost universal utterances. The fact that Gemini 2.0 Flash, as the most successful GenAI in Scenario 1, failed to do that, and in conjunction with the fact that it only achieved 63% total accuracy (still impressive, especially considering the much lower accuracy of ChatGPT 4o and DeepSeek-R1), means that using this scenario would not aid the researcher in their work. Scenario 2 was much more successful, producing the most accurate classifications of all three scenarios in all chatbots. Chatbots were instructed to use only a short excerpt of the theory, which resulted in no “phantom” speech act classifications – all three chatbots used only the appropriate twenty-five speech acts from the finite speech act typology in their annotation. All three chatbots were comparable in their results, with ChatGPT o1 emerging on top, having achieved an impressive accuracy of 81%, followed closely by DeepSeek-R1 with 79% accuracy, and Gemini 2.0 PRO Experimental with 75% accuracy. The table below summarizes the success rate of each chatbot. Table 2. Accuracy of Chatbots in Scenario 2 (short theory + manually annotated example). Total speech acts Correct classifications Mismatched classifications Accuracy (%) ChatGPT o1 762 623 139 82% Gemini 2.0 PRO Experimental 762 575 187 75% DeepSeek-R1 762 600 162 79% ChatGPT o1, the most successful chatbot in this scenario and the most successful overall, misclassified only 139 speech acts. Of those, it struggled the most with Opines, which were classified as Tells (39), Remarks (4), Suggests (4), and Complains (2); Resolves, which it classified as Tells (12), Remarks (2), Requests (2), Willings (1), and Promises (1); Complains, misclassified as Requests (8), Remarks (4), Opines (3), and Tells (3); and Remarks, misclassified as Requests (7), Tells (2), Opines (2), and Complains (1). The entire table of misclassifications for ChatGPT o1 in Scenario 2 is presented below. DeepSeek-R1 misclassified Tells as Discloses (15), Opines (7), Requests (2), Thanks (1), Remarks (1), and Complains; Opines as Tells (14), Complains (12), Requests (4), Remarks (3), Resolves (3), Suggests (1), Willings (1), and Discloses (1); Complains as Requests (12), Opines (10), Tells (4), and Discloses (1); and Remarks as Opines (11), Requests (8), Tells (5), and Resolves (2). 27LANGUAGE Gemini 2.0 PRO Experimental struggled the most with Opines, which were often misclassified as Tells (47), Discloses (3), Resolves (2), and Requests (1); Complains, which were misclassified as Opines (25), Tells (13), Requests (11), Remarks (2), and Discloses (1); Remarks, which were misclassified as Tells (17), Requests (11), and Opines (4); and Resolves, which were misclassified as Tells (9), Requests (4), and Willings (1). We note that similar patterns emerge in all three scenarios, where the most misclassified speech acts were Opines, Tells, Complains, Resolves, and Remarks. This is unsurprising, considering the nature of such speech acts. The delineation between Opines and Tells, for example, is largely subjective (Edmondson, House, and Kádár 2023, 169), and the deciding factor is usually the person annotating the text, who “decides” on some criteria (note that this is not fatal for the methodology, as long as the criteria are applied consistently). This means that if the reference text was annotated differently (yet still consistently), the chatbots’ success rate order might have been reversed. The accuracy would remain in the same range, as the differences between individual styles of annotation would average out. Overall, the results of Scenario 2 represent a (surprisingly) stellar result. At 75-82% accuracy, all three chatbots’ performances could be used for conducting a preliminary classification of speech acts, which would facilitate the workload of researchers conducting speech act analysis. Before we examine ChatGPT o1’s results in more detail, a presentation of Scenario 3 results is in order. The results of Scenario 3 were, surprisingly, slightly worse than the results for Scenario 2. While the difference was marginal (a few percentage points, as indicated in the table below), it was detectable in all three instances. Table 3. Misclassifications of ChatGPT o1 in Scenario 2; the left column represents the misclassified reference category, the rows to the right show how ChatGPT o1 categorized them. Te ll O pi ne R eq ue st R em ar k Su gg es t C om pl ai n W ill in g R es ol ve Pr om is e To ta l Opine 39 5 4 2 50 Resolve 12 2 2 1 1 18 Complain 3 3 8 4 18 Remark 2 2 7 1 1 13 Request 4 2 2 1 2 11 Tell 4 3 1 8 Minimise 3 2 1 6 Suggest 5 5 Thanks 1 3 4 Disclose 2 2 Willing 1 1 Invite 1 1 2 Greet 1 1 Total 139 28 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ... Table 4. Accuracy of Chatbots in Scenario 3 (long theory + manually annotated example). Total speech acts Correct classifica- tions Mismatched classifica- tions Accuracy (%) ChatGPT o1 762 611 146 81% Gemini 2.0 PRO Experimental 762 563 199 74% DeepSeek-R1 762 579 183 76% We conclude that Scenario 1 was not as effective because of the sheer amount of data the chatbots found online, which resulted in conflicting typologies being applied (as evidenced in the use of classifications that were not in the proposed finite typology), yet Scenario 3 suffered from a similar shortcoming. An explanation of this worse outcome might be readily available in studies examining the empirical relationship between dataset size, model size, and compute power in GenAI. These studies establish that while increasing dataset size improves model accuracy, the benefits taper off beyond a certain threshold (Kaplan et al. 2020; Chowdhery et al. 2022). At that point, models may begin to overfit. Overfitting is a phenomenon in machine learning that occurs when a model fits the training data too well or even exactly, which results in worse performance on any new or unseen data. To further improve the results, techniques such as regularization, pruning, and dropout to mitigate performance degradation might be required (Kaplan et al. 2020; Chowdhery et al. 2022). In Scenario 3, where we provided an exhaustive 80-page theoretical background, the model might have exhibited a tendency towards overfitting, making unnecessary distinctions and misclassifying instances. This mirrors findings in large-scale model training, where excessive data can paradoxically lead to poorer performance because of increased memorization. This aligns with other studies on machine learning, for example, findings from Kaplan et al. (2020), which highlight diminishing returns when dataset size surpasses a certain threshold. Still, both Scenario 3 and Scenario 2 produced results in the range of approximately 75-82%, which makes them appropriate for research facilitation and, at the very least, preliminary annotation of data. Indeed, we would argue that the results are even better than purely statistical data shows. A more qualitative approach to the results of the best performing chatbot, ChatGPT o1 in Scenario 2, confirms that assertion. We can demonstrate this by examining and contextualizing the kinds of mistakes the chatbot made in individual categories. Opines Overall, ChatGPT o1 misclassified 50 Opines; however, thirty-nine of those were misclassified as Tells. The delineation between Opines and Tells is subjective, so different manual annotations might yield even higher accuracy for ChatGPT o1. In fact, from a research perspective, it would be useful to instruct the chat to mark any instance of Opine or Tell as Opine/Tell, and the researcher could then produce a more fine-grained verdict based on the needs of the project. In the case of The Birthday Party, distinguishing between Opines and Tells proves to be especially difficult, as characters often formulate their opinions as facts, for example, in the exchanges between Stanley and Meg. 29LANGUAGE Example 1. STANLEY. The milk’s off. Opine/Tell MEG. It’s not. Opine/Tell Example 2. MEG. Perhaps they couldn’t find the place in the dark. Opine/Opine It’s not easy to find in the dark. Opine/Tell STANLEY. They won’t come. Opine/Opine Someone’s taking the Michael. Opine /COmplain Forget all about it. Request/ResOlve It’s a false alarm. Opine/Tell A fake alarm. Opine/Tell In example 1, ChatGPT o1 marked both utterances as Tell; at least for Stanley’s utterance, this might be correct in certain cases. As annotators, we decided to mark this as Opine because it is a statement that Stanley and Meg dispute and because of the broader context – Stanley’s badgering of Meg. However, Stanley formulates the utterance as a fact (Tell), so one could also adopt a different criterion and classify it as a Tell. Similarly in example 2, the line “It’s not easy to find in the dark” could also be classified as a Tell if taken out of context, but we classified it as an Opine because Meg was continuing her speculation from the utterance before (“Perhaps they couldn’t find the place in the dark”). Similarly for Stanley’s “It’s a false alarm”: in most cases, this would be considered a Tell, but because we know from earlier that this is Stanley continuing his speculation, we label it as Opine. So, the misclassifications of chatbots are often related to the broader context and actual meaning of the text, which chatbots have not (yet) mastered. Resolves Most Resolves were also mislabelled as Tells, especially in instances when a Resolve followed a Request in Initiate-Satisfy pattern. Requests are often satisfied with either a Resolve or a Tell, and the chatbot had trouble differentiating between the “No” of Tell (Did you know? No.) and the “No” of Resolve (Come here. No.), as in Example 3 and 4. Example 3. GOLDBERG. Well, of course, you must have one. (He stands.) We’ll have a party, eh? What do you say? Request/Request MEG. Oh yes! ResOlve/Tell Example 4. MEG. What do you mean?/ Request/Request STANLEY. Come over here./ Request/Request MEG. No. ResOlve/Tell 30 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ... Complains Complains were most often misclassified as Requests. As can be seen in examples 5 and 6 below, Complains in the text were often formulated as Requests, so we can see why the chatbots classified them as such. Only the broader context (mostly utterances before and after the Complain) determines that it is, in fact, a Complain. As in the case of Opines, chatbots were missing this additional context, a deficiency that might be rectified in further studies via smart prompt-engineering. Example 5. (Lulu is scolding Stanley): LULU: I mean, what do you do, just sit around the house like this all day long? COmplain/RequesT Hasn’t Mrs Boles got enough to do without having you under her feet all day long? COmplain/RequesT Example 6. MCCANN. Sure I trust you, Nat. GOLDBERG. But why is it that before you do a job you’re all over the place, and when you’re doing the job you’re as cool as a whistle? COmplain/RequesT MCCANN. I don’t know, Nat. Remarks Interestingly, Remarks were most often misclassified as Requests. Remarks are highly ritualistic speech acts, while Requests are substantive speech acts, so the discrepancy is worth addressing. In our manual annotation, we annotated utterances like Example 7 Remarks, as they were often followed by a more substantive Request and they function more as Remarks in the dialogue (Meg replies to the Requests, not Remarks); however, a different researcher might, like chatbots, interpret them as very mild Requests. Example 7. What’s his name?/ Request/Request MEG. Stanley Webber./ tell/tell GOLDBERG. Oh yes?/ RemaRk/RequesT Does he work here?/ Request/Request MEG. He used to work./ tell/tell He used to be a pianist./ tell/tell In a concert party on the pier./ tell/tell GOLDBERG. Oh yes?/ RemaRk/RequesT On the pier, eh?/ RemaRk/RequesT Does he play a nice piano?/ Request/Request MEG. Oh, lovely./ Opine/Opine 31LANGUAGE Requests Chatbots had the most difficulties recognizing requests that were not in a question form, which they classified as Tells, as we can see in Examples 8 and 9. However, all chatbots had remarkable overall results in terms of Requests. ChatGPT o1 correctly identified 242 out of 253 Requests, with an accuracy of 96%. This might also be because Requests are often in question forms, so they are relatively easy to recognize. Example 8. GOLDBERG. You know what I said when this job came up. RequesT/Tell I mean naturally they approached me to take care of it. tell/tell And you know who I asked for? Request/Request MCCANN. Who? Request/Request Example 9. MEG. He hasn’t mentioned it. tell/tell GOLDBERG (thoughtfully). Ah! RemaRk/RequesT Tell me. RequesT/Tell Are you going to have a party? Request/Request MEG. A party? Request/Request Other Speech Acts Other speech act categories yielded less than 10 misclassifications across the entire text. Furthermore, the reasoning behind the mistakes is often like the above: the chatbots were unable to recognize the pertinent context. Some Tells, for example, were misclassified as Opines, usually because of emotive language in the utterances. Interestingly, none of the chatbots (excluding one count in case of Gemini 2.0 PRO experimental in Scenario 2), managed to recognize any of the Minimizes in the play, which were usually misclassified as Opines or Tells. Suggests were misclassified as Requests, which is not surprising, considering it is sometimes difficult to articulate the difference between the two. We use the criterion of benefit for the speaker for Requests and benefit for the hearer for Suggest, but more complex cases, which might benefit both the speaker and the hearer, complicate things and require additional ad hoc criteria. Thanks was misclassified as Opine or Tell, which is also due to a failure to grasp the necessary context of the text. On the bright side, Leave-Takes, Welcomes, and How-are-yous were classified with 100% accuracy by all chatbots in Scenarios 2 and 3. All chatbots also had a high accuracy in recognizing other ritualistic speech acts, such as Greets, yet only ChatGPT o1 (in Scenario 2 and 3) managed to recognize one instance of another ritualistic speech act, Extractor. 5 Conclusion The purpose of this article was to determine the viability of using different GenAI chatbots for automated speech act annotation of texts for pragmatic purposes. We sought to answer the following questions: 32 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ... 1. How successful are chatbots in providing an automatically annotated text in line with the instructed speech act typology? 2. Which scenario yields the best outcome (the highest fidelity to manual annotation)? 3. Are chatbots useful for performing such preliminary annotations or, at the very least, facilitating this process? In line with this, we can draw the following conclusions. Chatbots were (1) highly successful at annotating the text with the prescribed typology, yielding 75-82% accuracy; however, (2) the prompt-engineering that instructs the chatbots does matter: Scenario 1 offered only 26% (ChatGPT 4o), 29% (DeepSeek-R1), and 63% (Gemini 2.0 Flash) accuracy. Chatbots should therefore be provided with a much smaller reference frame (data provided) within which to operate. Furthermore, more is not always better: the more detailed theory in Scenario 3 yielded slightly worse results, though still useful. We believe this to be the result of overfitting, which is consistent with results from other studies on machine learning, which indicate diminishing returns when a dataset surpasses a certain threshold (Kaplan et al. 2020). Whether the accuracy could be further improved is subject to further studies, which should experiment with different prompts, as well as with the quantity and perhaps quality of the theory provided to the chatbots. Finally, we believe that the results warrant a tentative conclusion that (3) chatbots, using prompts such as Scenario 2, can be useful and can facilitate research in pragmatics by providing an automated preliminary annotation of the text. That being said, further research is needed, especially in terms of how the chatbots would perform in annotating texts that require further context, such as historical documents. One limitation of this study was that we tested the chatbots on a play, which typically includes the relevant context for the viewer/reader, whereas for annotation of a historical text, we require further historical context to properly classify utterances. Whether chatbots are capable of that is subject to further research. Furthermore, it would be useful to test the capabilities of chatbots in annotating and classifying texts in other domains of pragmatics and linguistics in general, such as gambits or ritual frame indicating expressions. Considering the relative similarity between the tasks, our approach could be beneficial in these areas as well. References Altmäe, Signe, Alberto Sola-Leyva, and Andres Salumets. 2023. “Artificial intelligence in scientific writing: A friend or a foe?” Reproductive BioMedicine Online 47 (1): 3–9. https://doi.org/10 .1016/j.rbmo.2023.04.009. Chen, Xi, Jun Li, and Yuting Ye. 2024. “A feasibility study for the application of AI-generated conversations in pragmatic analysis.” Journal of Pragmatics 223:14–30. https://doi.org/10.10 16/j.pragma.2024.01.003. Chowdhery, Aakanksha, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, et al. 2022. “PaLM: Scaling language modeling with pathways.” arXiv. https:// doi.org/10.48550/arXiv.2204.02311. DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, et al. 2025. “DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning.” arXiv. https://doi.org/10.48550/ARXIV.2501.12948. Edmondson, Willis, and Juliane House. 1981. Let’s Talk, and Talk about It: A Pedagogic Interactional Grammar of English. Urban & Schwarzenberg. Edmondson, Willis J., Juliane House, and Daniel Z. Kádár. 2023. Expressions, Speech Acts and Discourse: A Pedagogic Interactional Grammar of English. Cambridge University Press. 33LANGUAGE Else, Holly. 2023. “Abstracts written by ChatGPT fool scientists.” Nature 613 (7944): 423. https://doi .org/10.1038/d41586-023-00056-7. Gavez, Urša. 2016. “The reception of Harold Pinter’s plays in Slovenia between 1999 and 2014.” ELOPE: English Language Overseas Perspectives and Enquiries 13 (2): 51–61. https://doi.org/10 .4312/elope.13.2.51-61. Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, et al. 2024. “Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.” arXiv. https://doi.org/10.48550/ARXIV.2403.05530. Hazemali, David, Janez Osojnik, Tomaž Onič, Tadej Todorović, and Mladen Borovič. 2024. “Evaluating chatbot assistance in historical document analysis.” Moderna arhivistika 7 (2): 53–83. https://doi .org/10.54356/ma/2024/biub3010. House, Juliane. 1996. “Developing pragmatic fluency in English as a foreign language: Routines and metapragmatic awareness.” Studies in Second Language Acquisition 18 (2): 225–52. https://doi .org/10.1017/S0272263100014893. House, Juliane, and Dániel Z. Kádár. 2023. “Studying small talk from a pragmatic angle: An introduction.” Acta Linguistica Academica 70 (4): 411–18. https://doi.org/10.1556/2062.20 23.00704. House, Juliane, Dániel Z. Kádár, Tadej Todorović, Matjaž Klemenčič, David Hazemali, Tomaž Onič, and Katja Plemenitaš. 2024. “Capturing power in diplomatic language use: The case of a closed-door mediatory negotiation and its aftermath during the breakup of the former Yugoslavia.” Journal of Language and Politics. https://doi.org/10.1075/jlp.24036.hou. Hribar, Darja. 2004. “Harold Pinter in Slovene translation.” ELOPE: English Language Overseas Perspectives and Enquiries 1 (1–2): 195–208. https://doi.org/10.4312/elope.1.1-2.195-208. Kádár, Dániel Z., Juliane House, Tadej Todorović, Tomaž Onič, David Hazemali, Katja Plemenitaš, and Donathan Brown. 2024. “The language of diplomatic mediation – A case study of an emergency meeting in the wake of the Yugoslav wars.” Language & Communication 96: 54–66. https://doi.org/10.1016/j.langcom.2024.02.004. Kaplan, Jared, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. “Scaling laws for neural language models.” arXiv. https://doi.org/10.48550/arXiv.2001.08361. Kuhn, Harold William. 1955. “The Hungarian method for the assignment problem.” Naval Research Logistics Quarterly 2 (1–2): 83–97. Liu, Shiyu, Juliane House, and Dániel Z. Kádár. 2024. “Bargaining in Chinese livestream sales events.” Discourse, Context & Media 60:100787. https://doi.org/10.1016/j.dcm.2024.100787. Mercer, Sarah, Samuel Spillard, and Daniel P. Martin. 2025. “Brief analysis of DeepSeek R1 and its implications for generative AI.” arXiv. https://doi.org/10.48550/ARXIV.2502.02523. Mohar, Tjaša, Sara Orthaber, and Tomaž Onič. 2020. “Machine translated Atwood: Utopia or dystopia?” ELOPE: English Language Overseas Perspectives and Enquiries 17 (1): 125–41. https://doi.org/10.4312/elope.17.1.125-141. Monteiro, Mateus De Souza, Vinícius Carvalho Pereira, and Luciana Cardoso De Castro Salgado. 2023. “Investigating Politeness strategies in chatbots through the lens of conversation analysis.” In Proceedings of the XXII Brazilian Symposium on Human Factors in Computing Systems, Maceió, Brazil, 1–12. Association for Computing Machinery. https://doi.org/10.1145/3638067.363 8068. Moors, Sarah. 2025. “DeepSeek changes everything we thought we knew about building smart machines.” Digital Health Insights, January 29. https://www.dhinsights.org/news/deepseek-changes-everything- we-thought-we-knew-about-building-smart-machines.. Onič, Tomaž. 2016. “Slogovne značilnosti … [premolk] … Pinterjevega dialoga.” Primerjalna književnost 39 (2). https://ojs-gr.zrc-sazu.si/primerjalna_knjizevnost/article/view/6367. Onič, Tomaž, and Nastja Prajnč Kacijan. 2020. “Repetition as a means of verbal and psychological violence in interrogation scenes from contemporary drama.” Ars & Humanitas 14 (1): 13–26. https://doi.org/10.4312/ars.14.1.13-26. Pinter, Harold. 1991. Plays One. Faber & Faber. 34 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ... Podbevšek, Katarina, and Nina Žavbi. 2021. “Jezikovna norma v luči odrske govorne estetike.” Jezik in Slovstvo 66 (2–3): 145–56. https://doi.org/10.4312/jis.66.2-3.145-156. Raffel, Colin, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2023. “Exploring the limits of transfer learning with a unified text-to- text transformer.” arXiv. https://doi.org/10.48550/arXiv.1910.10683. Rane, Nitin, Saurabh Choudhary, and Jayesh Rane. 2024. “Gemini versus ChatGPT: Applications, performance, architecture, capabilities, and implementation.” SSRN Scholarly Paper. Social Science Research Network. https://doi.org/10.2139/ssrn.4723687. Schechner, Richard. 1966. “Puzzling Pinter.” The Tulane Drama Review 11 (2): 176–84. https://doi.org/10.2307/1125196. Setlur, Vidya, and Melanie Tory. 2022. “How do you converse with an analytical chatbot? Revisiting Gricean maxims for designing analytical conversational behavior.” arXiv. https://doi.org/10.48550/ARXIV.2203.08420. Stokel-Walker, Chris. 2023. “ChatGPT listed as author on research papers: Many scientists disapprove.” Nature 613 (7945): 620–21. https://doi.org/10.1038/d41586-023-00107-z. Taguchi, Naoko, and Dániel Z. Kádár. 2025. “Pragmatics: An overview.” In The Encyclopedia of Applied Linguistics, edited by Carol A. Chapelle, 1st ed., 1–8. Wiley. https://doi.org/10.1002/9781405198431.wbeal1338.pub2. Wang, Kevin, Junbo Li, Neel P. Bhatt, Yihan Xi, Qiang Liu, Ufuk Topcu, and Zhangyang Wang. 2024. “On the planning abilities of OpenAI’s O1 models: Feasibility, optimality, and generalizability.” arXiv. https://doi.org/10.48550/ARXIV.2409.19924. Williams, Iwan, and Tim Bayne. 2024. “Chatting with bots: AI, speech acts, and the edge of assertion.” Inquiry: 1–24. https://doi.org/10.1080/0020174X.2024.2434874. 35LANGUAGE Appraisal Analysis and AI Chatbots: Do We Even Need Humans? ABSTRACT Artificial intelligence (AI) is rapidly transforming various fields, including linguistics, by offering new tools for the analysis and generation of human language. As AI tools, particularly chatbots, have become increasingly sophisticated, questions have arisen about their capacity to replicate complex human linguistic processes, such as those covered by the appraisal framework developed by Martin and White (2005). The appraisal framework examines how three main categories – attitude, graduation, and engagement – are expressed in discourse at the semantic level. This paper investigates how AI chatbots, MS Copilot, ChatGPT, and Claude approach appraisal analysis in a selected text, highlighting similarities and notable differences in comparison to human analysis. The findings, although based on analysis of a single text, provide valuable insights into the advantages and drawbacks of AI in mimicking human-like appraisal analysis, which might be beneficial when conducting appraisal research. Keywords: appraisal, human and AI comparative analysis, ChatGPT, MS Copilot, Claude Analiza jezika vrednotenja in pogovorni sistemi: ali ljudi sploh potrebujemo? IZVLEČEK Umetna inteligenca hitro preoblikuje različna področja, vključno z jezikoslovjem, tako da ponuja nova orodja za analizo in ustvarjanje človeškega jezika. Ker postajajo orodja umetne inteligence, zlasti pogovorni sistemi, vse bolj izpopolnjena, se pojavljajo vprašanja o njihovi sposobnosti ponovitve kompleksnih človeških jezikovnih procesov, kot so tisti zajeti v jeziku vrednotenja, ki sta ga razvila Martin in White (2005). Okvir jezika vrednotenja preučuje, kako se v diskurzu izražajo tri glavne kategorije – odnos, stopnjevanje odnosov in vključenost – na semantični stopnji. Članek raziskuje kako pogovorni sistemi, MS Copilot, ChatGPT in Claude pristopijo k analizi jezika vrednotenja v izbranem besedilu, tako da osvetli podobnosti kot tudi pomembne razlike skozi primerjavo s človeško analizo. Ugotovitve, čeprav temeljijo na enem izbranem besedilu, omogočijo dragoceni vpogled v prednosti in pomanjkljivosti umetne inteligence pri posnemanju človeške jezikovne analize, kar je lahko koristno pri raziskovanju jezika vrednotenja. Ključne besede: jezik vrednotenja, primerjalna človeška analiza in podprta z umetno inteligenco, ChatGPT, MS Copilot, Claude 2025, Vol. 22 (1), 35-52(228) journals.uni-lj.si/elope https://doi.org/10.4312/elope.22.1.35-52 UDC: 81:004.89 Agata Križan, Aja Barbič University of Maribor, Slovenia 36 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans? 1 Introduction The world has recently witnessed immense progress in the development of generative artificial intelligence (GenAI). A subset of AI – known as large language models (LLMs) – are machine models trained on vast amounts of data to understand and generate natural language and other types of content. This renders them capable of performing a diverse array of tasks, from writing essays and creating articles to answering questions and analysing texts, hence contributing to language research in a transformational way. Given their extensive data training and the utilisation of multilayer transformer-based neural networks, these AI tools can produce texts in which the language closely resembles that of humans, often mimicking human communication. For the purposes of this study, chatbots ChatGPT, MS Copilot, and Claude were used. ChatGPT was launched in 2022 by OpenAI, MS Copilot (MS short for Microsoft) was released in 2023, and Claude was developed by the research firm Anthropic and released in 2023. All three AI chatbots were trained to follow prompts to provide a response, are powered by AI, use up-to-date information, have conversational abilities, understand context, and possess broad knowledge. Since one capacity of the AI chatbots is to perform complex analyses, the aim of this paper is to contrast the analysis of appraisal in a selected text as provided by ChatGPT, MS Copilot, and Claude with that provided by human analysts. This study is particularly pertinent, since while appraisal theory is widely researched and has been effectively applied to various texts and genres, qualitative research on the analysis of appraisal as generated by AI chatbots is almost non-existent. ChatGPT has been applied in many research fields, including translation and language studies. Orel Kos (2024) examined the role of LLM-powered machine translation in subtitling instruction, revealing significant differences between students who relied on AI-generated translations and those who produced subtitles manually. The study highlights the challenges of multimodal awareness since post-editing AI-generated subtitles requires careful human intervention to ensure accuracy and contextual appropriateness. The first historical review of applications of ChatGPT in terms of its performance in various domains established that despite its many efficient applications, ChatGPT still has limitations (Shahriar and Hayawi 2023), which will likely be improved with the development of new versions. In this review, ChatGPT’s responses to some of the researchers’ questions are analysed. Furthermore, several studies have been conducted regarding ChatGPT’s applicability in researching language and language learning. Tica and Krsmanović (2024) explored student perceptions of ChatGPT in ESP (English for Specific Purposes) writing, revealing that while users appreciate its speed and accuracy, they remain divided on its overall effectiveness. By investigating the advantages of corpora and corpus tools over generative artificial intelligence in data-driven learning, Crosthwaite and Baisa (2023) highlighted advantages that corpora still hold over GenAI, such as knowledge of the data, authenticity, replicability, multimodality, safety, active learning, and absence of hallucinations, while GenAI has the potential to successfully address issues that corpus research has faced. For a more comprehensive understanding of language usage and patterns, as the authors argue, the combination of both tools is necessary. Uchida (2024) compared search results from ChatGPT and a large- 37LANGUAGE scale general corpus (COCA), focusing on word frequency lists, collocations, identification of genres, and words fitting certain grammatical patterns. The quantitative results showed that ChatGPT successfully completed most of these tasks, i.e., it identified general linguistic trends and can thus effectively assist in language learning. The study on ChatGPT by Curry, Baker, and Brookes (2024) shows that ChatGPT performs satisfactorily in the semantic categorisation of keywords, although the categories were mainly on the surface level, but fails in the analysis of concordances and function-to-form. Additionally, the study shows that it has (for now) certain limitations for more fine-grained corpus research and does not meet the standards of a human analyst. Imamović et al. (2024) assessed ChatGPT’s potential for annotating subcategories of attitude by using 11 Ted Talk texts and applying Martin and White’s (2005) appraisal theory. The results of the quantitative study show that ChatGPT was successful at identifying linguistic items in the text that carry evaluative meaning. However, the recall was very low, and detailed labelling with categories was incorrect compared to a human annotator. Moreover, evaluation of the capabilities of AI chatbots, including Claude 2, in generating scholarly content within the humanities and archaeology done by Lozić and Štular (2023) has shown that that while LLMs have transformed content generation, their ability to produce original scientific contributions in the humanities remains limited. Research by Koeva (2024) revealed that LLMs, Claude 3.5 Sonnet, Gemini 1.5 Pro, GPT-4o and GPT-4o mini, could be of assistance in linguistic research, despite errors. 2 Appraisal As an interpersonal and evaluative system, appraisal is concerned with the expression of the writer’s and speaker’s attitudes and emotions towards propositions, as well as with positions towards communicative events and other voices. It is thus concerned with evaluation in written and spoken discourse. According to Hunston and Thompson, evaluation is defined as “the broad cover term for the expression of speaker or writer’s attitude or stance towards, viewpoint on, feelings about the entities or propositions that he or she is talking (or writing) about” (1999, 5). Functions of evaluation include expression of the speaker’s or writer’s opinion, hence a reflection of the value system, construction, and maintenance of relations between the participants in a written or spoken event, and organisation of the discourse (ibid.). Despite its evaluative nature, the term ‘appraisal’ is used to emphasise its discourse- semantic aspect (Martin and White 2005). Appraisal systematically covers three domains at the level of discourse semantics, encompassing attitude, graduation, and engagement. The domain of attitude is further divided into affect, which deals with language that expresses emotions (e.g., anxious), judgement, dealing with language evaluating people’s behaviour and character (e.g., clever), and appreciation, addressing language aesthetically evaluating things, objects, events, and phenomena (e.g., unique). According to Martin and White (2005), attitude can be inscribed (explicitly/overtly expressed), i.e., encoded in attitudinal lexis, or evoked (implicitly/covertly expressed), i.e., implied via ideational meanings and/or co(n)text. Attitude can be positive or negative. 38 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans? Graduation is concerned with the gradability of attitudes. It adjusts the degree of an evaluation (force, e.g., very popular) or the strength of boundaries (focus, e.g., true happiness) (ibid., 37). Engagement deals with the dis/alignment of writers’/speakers’ positions and voices with those referenced in the text and by other voices (e.g., seem) (ibid., 34-35). From the perspective of functional linguistics (SFL), appraisal construes the interpersonal metafunction, which is concerned with how participants interact with one another, influence the behaviour of others, construct and fill social roles, adopt attitudinal and evaluative positions, and form bonds, relationships, and alliances (White 2000, 4). In other words, the interpersonal metafunction of language is defined as “language as action” (Halliday and Matthiessen 2004, 30). The appraisal framework, as developed by Martin and White (2005), has proven useful in systematically revealing the (inter)connection of all subsystems of appraisal language in various texts and genres, contributing to the understanding not only of the evaluative component of texts but also of the social one, as well as to the understanding of how and why texts mean what they do. With the development and capacity of AI, particularly chatbots, appraisal can undoubtedly be analysed by AI. The question is simply how successfully and in which manner – independently or with human assistance. 3 Methodology This article explores the potential of ChatGPT, MS Copilot, and Claude to identify/annotate instances of appraisal in a selected text, using the appraisal system. It highlights discrepancies and similarities between the three and contrasts them with an analysis performed by a human. Linguistic annotation is vital for a sophisticated exploration of language by providing insights into language use. Annotation can be used at various levels, including phonetic, prosodic, grammatical, semantic, and pragmatic/discursive (Leech 1993). For the purposes of this analysis, which took place between November 22, 2024 and February 12, 2025, a random text was chosen, yet certain prerequisites were considered: the article had to be from a serious newspaper, in a current issue, of average length, available online, and containing at least some evaluative language. It was selected from the globally renowned British daily newspaper The Guardian and addresses UK universities asking the government to restart the flow of EU students to Britain after Brexit and a return to the Erasmus student exchange programme. The total number of words in the article is 2,496. Freely accessible AI chatbots were used. The instances of appraisal in the text were identified by two human analysts proficient in appraisal theory (i.e., the authors of this paper, hereafter referred to as human annotators1). The double coding increased objectivity, and the annotation included tags for affect, judgement, and appreciation (subcategories of attitude), explicit/implicit (attitudinal realisation), positive/ negative (attitudinal status), graduation, and engagement. After independent coding, the 1 In this study, the ‘human annotator’ analyses (identifies/annotates) appraisals according to Martin and White’s appraisal typology, while the ‘user’ is a human using AI for the appraisal analysis via prompting. 39LANGUAGE annotations were compared. When the coding differed or an appraisal was not identified at all, the case was discussed according to Martin and White’s appraisal typology, the analysts’ knowledge of appraisal theory and experience of coding, and co(n)text. Where necessary, a dictionary was used to check the definition, and where possible, double and even multi- coding of appraisals was accepted for the sake of accuracy and greater objectivity. For communication with chatbots, the users employed prompt engineering (i.e., carefully creating instructions and questions for chatbots). The human annotators decided upon the use of initial prompts that focused solely on the appraisal (analysis) of the given text. The number of prompts for the first chatbot (MS Copilot) was 40, while for the other two it varied slightly, depending on the responses and thus on subsequent prompts. The first prompts targeted the chatbots’ knowledge of and familiarity with appraisal theory, instructing them to analyse/identify appraisals in the entire text. Subsequent prompts depended on the preceding responses. First, the prompting was performed for MS Copilot until sufficient data was gathered, then the same or similar prompts, depending on responses,2 were used for ChatGPT and Claude. After prompting the chatbots to define the appraisal theory, they were asked to analyse the text in terms of appraisals. Since no comprehensive analysis was provided (e.g., lack of implicit attitudes, clarity, or systematic exemplification), additional, more specific prompts or clarification requests were used. Next, human annotators divided the text into multiple parts and asked chatbots to analyse appraisals in these shortened parts. If a chatbot provided a comprehensible answer, no additional prompt was needed. For example, one of the prompts asked about implicit attitudes in the text, and both ChatGPT and Claude answered that graduation and engagement could be expressed implicitly, even though only attitudes can be explicit or implicit. An additional prompt was thus used to ask both chatbots about the source of such information and explanation. Another prompt example asked the chatbots to identify engagement in a sentence which, based on Martin and White’s appraisal theory, used an engaging item. ChatGPT and Claude categorised this item as such, whereas MS Copilot did not recognise it as appraisal and thus an additional prompt was used to ask this chatbot specifically about this item. Here are some examples of prompts: What is appraisal theory as developed by Martin J. R. and White P. R. R.? Identify and categorise appraisals based on Martin and White’s appraisal theory. Analyse the given text in terms of appraisals. Please analyse in more detail the following sentence. (the sentence was provided) What about ‘toxic’, isn’t ‘toxic’ attitudinal? (referring to the AI tool’s previous response) Analyse in more detail. (referring to the AI tool’s previous response) Can ‘chief executive Viviene Stern’ be identified as judgement targeting the responsible position she holds, as well as graduation? 2 There was, for example, no need to use the subsequent prompt asking a particular chatbot for the alternative coding when the coding matched the human annotators’, even though a prompt asking for the alternative coding was used for another chatbot because its coding did not match the human annotators’. 40 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans? Once the prompting was finished for all three chatbots, each response and every coded instance provided by the chatbots were carefully compared with the human analysis, as well as between all three chatbots in terms of discrepancies and similarities. Given the brevity of this paper, only some discrepancies and similarities are highlighted and illustrated. For the purposes of analysis and comparison, a qualitative research method was used. The methodology in this study is similar to that applied by Hazemali et al. (2024), where structured questioning was used to evaluate the GPT-3.5 powered PDFGear Copilot’s ability to interpret historical documents. Their study found that while the chatbot performed well in factual retrieval, it struggled with deeper content interpretation. Given that appraisal analysis also requires a nuanced understanding of context and evaluative meaning, similar challenges were expected to arise when AI tools were used in this domain. 4 Comparative Analysis and Findings The initial prompts revealed that ChatGPT and MS Copilot prioritised extended phrases rather than discrete instances of appraisal, whereas the human annotators and Claude concentrated on analysing individual instances of appraisal. Only after additional prompting did ChatGPT and MS Copilot begin to highlight individual instances, particularly in relation to graduation and engagement. The responses generated by ChatGPT and Claude were typically more elaborate and specific than those by MS Copilot, providing a summary at the end and highlighting the main points. All three chatbots organised their analyses systematically, arranging responses around distinct categories. Comprehensive and nuanced analyses from the AI tools, particularly from ChatGPT and MS Copilot, frequently required additional prompting that suggested alternative coding or coding of certain instances absent from the AI analysis or asked for clarification. In comparison, Claude needed less additional prompting. Surprisingly, the chatbots identified fewer appraisals than the human annotators, especially when dealing with the whole text. A possible explanation is that only some examples were listed; however, in responding to one prompt, MS Copilot stated that those were the examples. If the listed appraisals were examples only, this could be perceived as a disadvantage as this demanded not only additional prompts but also carefully structured ones. If the listed examples were all identified appraisals, this could also be perceived as a disadvantage as the number of identified appraisals was mostly much lower than that identified by the human annotators. The initial prompts assessing the AI chatbots’ familiarity with appraisal theory and requesting appraisal analysis of the selected text revealed some discrepancies, primarily concerning the length and structure of responses. Claude provided a lengthy and detailed overview of appraisal framework. Interestingly, graduation and engagement, two major categories alongside attitude, were absent from MS Copilot’s appraisal analysis but were included in ChatGPT’s and Claude’s. For the sake of clarity, a subsequent prompt was more specific, inviting the chatbots to identify and categorise (instead of analysing which was used in the initial prompt) appraisals based on Martin and White’s model (2005), and specifically demanding the analysis of explicit, implicit, positive and negative attitudes, as well as attitudinal targets (emoters for 41LANGUAGE affect) instead of simply attitudes. Some notable discrepancies in attitudinal realisation, categorisation, and attitudinal status were exhibited between ChatGPT, MS Copilot, Claude, and the human annotators, as exemplified in (1–3). (1) UK universities …, but hopeful amid talks on youth mobility (2) We also get a tiny bit uncomfortable (3) British universities say they … are adopting a “watch and wait” approach In (1), hopeful was categorised by MS Copilot as implicit negative affect, whereas the human annotators coded it as explicit positive affect (universities’ hope and optimism). The implicit realisation of the attitude seemed problematic since hopeful is clearly attitudinal lexis. Similarly, MS Copilot coded uncomfortable in (2) as implicit negative affect, while the human annotators coded it as explicit negative affect since the word conveys the feeling of unease directly rather than indirectly. Claude’s analysis did not include hopeful, although this was clearly attitudinal, but included uncomfortable as explicit negative affect. However, when the analysis focused solely on the sentence, Claude identified hopeful as positive appreciation, although dictionaries define it as a feeling. In (3), watch and wait was identified as implicit negative affect by MS Copilot (in the sense of monitoring without any action), whereas human annotators coded it as implicit positive judgement, targeting the British universities’ behaviour based on their cautious and patient approach to avoid undesirable political conflict. Since this instance was not identified as appraisal by Claude initially, it identified it as implicit negative affect after additional prompting focussing on implicit attitudes and their categorisation. Interestingly, before a prompt asking for the categorisation of the implicit watch and wait, the phrase was identified in terms of implicitness as an underlying urgency, which was unclear. This exemplifies the difference in attitudinal categorisation and status. While ChatGPT and MS Copilot did not identify good in good students as appraisal, the human annotators and Claude did. Although the human annotators categorised good as judgement based on the obvious target being students’ capability, Claude categorised it initially as appreciation, despite its identification of European students as the target. After a subsequent prompt, Claude finally identified good students as explicit positive judgement. Claude (after a more specific prompt mentioned above) and the human annotators also identified disproportionately in burden rested disproportionately as implicit negative appreciation, whereby Claude just listed it among implicit attitudes together with the target, whereas the human annotators identified the implicitness based on a prior identification of disproportionately as graduation. (4) It was absolutely fantastic that youth and students were ‘central’ to the discussion about the reset in relations with the EU. Although ChatGPT and MS Copilot provided examples in full sentences or longer phrases when invited to analyse the entire text, both chatbots highlighted only fantastic as explicit appreciation in (4). In the analysis of the entire text, all three chatbots, and the human annotators identified absolutely fantastic as appreciation, whereas later, when asked about 42 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans? implicit attitudes, MS Copilot identified it as implicit affect conveying optimism and hope. When the analysis focused solely on sentence (4), ChatGPT and MS Copilot identified absolutely fantastic as affect in the sense of enthusiasm and approval, with ChatGPT, interestingly, identifying it as explicit affect. Claude, in contrast, identified the phrase as positive appreciation in the first place. Based on Martin and White’s (2005, 56) examples of appreciation, fantastic is a positive reaction to the recognition of youth and students in the discussion rather than an emotion that someone feels. The human annotators and Claude also identified central as appreciation in the sense of primary importance as well as graduation and engagement. However, Claude’s identification of appreciation and engagement occurred when the individual sentence using this word was the focus. Only after additional prompting, which was not specifically targeted at central, did one of ChatGPT’s analyses include central as positive appreciation and graduation. Additionally, the human annotators coded it as implicit positive judgement targeting politicians for giving it primary attention during the meeting. (5) It’s not in our interest for the government to end up caught in a kind of toxic debate about immigration domestically.  Regarding attitudinal realisation, toxic in (5) was identified by the human annotators as explicit negative appreciation targeting the debate, whereas Claude categorised toxic debate as implicit negative appreciation of political discourse. Despite its obvious attitudinal significance, neither ChatGPT nor MS Copilot listed this as an example of attitude. Although ChatGPT’s explanation of implicit attitudes as “attitudes [that] are often subtle and rely on the reader’s interpretation of what is implied rather than explicitly declared” is valid, questions arise regarding whose interpretation (voice) is involved in the analysis of implicit attitudes, given that chatbots gather information from the internet and likely from analyses of appraisals conducted by various human analysts in a variety of contexts. For example, the status of the same word may vary dependent on the context. However, since each text is written with an ‘ideal’ reader in mind (Kress 1988, 107), which is a description of the reading position to which the actual reader is invited to conform (Macken-Horarik 2003), it is possible that the reading position of the actual reader does not match that of the ideal reader. This may also happen with analysts, and this misalignment may impact the identification of implicit attitudes. Consequently, the identification of implicit attitudes may thus vary between a human annotator and AI, and between a human annotator and an author. To minimise subjectivity in the analysis, double/multi coding is necessary to exhaust the possible multiple interpretations (Page 2003). Examining hints, such as graduation and engagement (if used), in addition to contextual knowledge, may be helpful in identifying implicit attitudes. ChatGPT, however, when specifically asked about the presence of implicit attitudes after the initial analysis, listed some that were not identified by the human annotators. While this may suggest a more fine-grained analysis than that of its human counterpart, this may not necessarily be the case. Some implicit attitudes were identified individually to fit the overall pattern of evaluation across the text, which was the positive evaluation of the pre- Brexit youth and student exchange programmes that UK universities would like to reinstate, whereas the human annotators focused more on wording, as suggested by Thompson (2008). 43LANGUAGE Interestingly, regret in we really, really regret the fact was identified as an implicit attitude by ChatGPT and MS Copilot, whereas the human annotators and Claude coded it as an explicit attitude by considering it as attitudinal lexis3. After challenging ChatGPT and MS Copilot with the prompt reminding them about their preceding classification, both accepted the coding by praising the human annotators’ coding skills. ChatGPT even provided the reason for its explicitness, but still insisted on the presence of implicit affect, explaining rather opaquely that “… the text subtly implies dissatisfaction with these outcomes while maintaining neutrality.”  Certain discrepancies were also observed with regard to graduation and engagement. The issue of dealing with whole sentences or phrases was noticeable again when MS Copilot identified the phrase flow of really good European students as graduation, whereas the human annotators and Claude highlighted the individual item flow of and really, which is more precise. Interestingly, MS Copilot identified extremely in extremely important as focus, although it upgrades explicit attitude (appreciation), and did not recognise the repetition of the intensifier really in really, really regret as graduation, unlike the human annotators, Claude, and ChatGPT. For an accurate quantitative and qualitative analysis, such data are indubitably indispensable. (6) EU data for 2020 shows that 17,795 students came to the UK in 2018/2019, almost double the number of British students, 9,908, that went to the EU. The previous year, 18,839 EU citizens came to British universities compared with 9,540 going to the EU. Moreover, in (6), MS Copilot provided an unusual explanation that the phrase almost double the number quantified the comparison, as comparison of attitudes is typically regarded as a source of intensification. How can comparison be quantified? In a prompt requesting the analysis of appraisals, MS Copilot identified the phrase almost double the number as engagement, referring to it as comparison in the explanation, as well as graduation (intensification), whereas ChatGPT identified it as graduation via quantification and engagement. While the phrase unequivocally conveys graduation via quantification (amount), as rightly recognised by the human annotators and Claude, and based on the co-text, also via comparison (intensification), its identification as engagement (contrast) by ChatGPT seemed ambiguous. Although it may imply contrast based on the co-text, engagement in the appraisal framework, including counter-expectancy (e.g., although, despite, still), is primarily expressed via grammatical elements, if counter-expectancy was meant by contrast, as claimed by ChatGPT. The identification of the above phrase solely as graduation seems to reflect greater awareness of engagement and its grammatical realisation by the human annotators and Claude than by ChatGPT and MS Copilot. Claude, for example, did not identify possible in possible return to the Erasmus student exchange programme as engagement, as the human annotators did based on modality, but as graduation, which is also reasonable when regarded as a downgraded version of certainty, which the human annotators should 3 ‘Regret’ is defined as a feeling of sadness, distress, and/or disappointment (https://dictionary.cambridge.org/ dictionary/english/regret, https://www.merriam-webster.com/dictionary/regret, https://www.collinsdictionary. com/dictionary/english/regret) 44 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans? have taken into consideration. However, when later engagement in the individual sentence was the focus of analysis, Claude identified it as engagement under modality, which was one of the Claude’s listed sources of engagement. These types of examples illustrate that responses provided by chatbots should be carefully studied prior to use in appraisal research. Furthermore, in the initial analysis, only Claude and the human annotators identified data as graduation in (6). Additionally, the human annotators identified it as graduation, which is used as a mechanism contributing to the identification of implicit negative appreciation targeting the (disproportional) exchange situation for the British students. Similarly, Claude identified specific numbers as an implication of the scale of loss, which may also be interpreted as an attitudinal trigger. When ChatGPT and MS Copilot were challenged with a specific prompt asking about the coding of the issue relating to the inflow and outflow of exchange students as implicit negative appreciation because they did not identify any implicit attitudes, they both accepted such coding. In contrast, there was no need for another prompt for Claude since it identified an implicit attitude instantly. Interestingly, ChatGPT regarded engagement and graduation as implicit, providing an unusual or unclear explanation that engagement and graduation can carry implicit elements, and thus identifying central in (4) as a non-explicit intensifier. The response to the subsequent prompt referred to the knowledge of appraisal, highlighting the work of Martin and White’s book The Language of Evaluation: Appraisal in English (2005). However, according to this source (2005, 131–32), in reference to engagement, only pronouncement (as a subcategory of engagement) can be realised as explicit or implicit, whereas such realisation does not occur with graduation. Since for a more accurate and fine-grained appraisal analysis non-attitudinal graduation as a potential trigger of certain attitudes in specific contexts (Hood 2004) should be also taken into consideration4, one of the prompts specifically requested the identification of non-attitudinal graders in the whole text. MS Copilot included really, absolutely, and tiny bit as examples of non-attitudinal graders, although they clearly intensified explicit affect and appreciation. Furthermore, toxic in toxic Brexit row was identified by ChatGPT as non-graduation when focussed solely on the sentence using this phrase, although it is clearly an intensified negative attitude, as rightly identified by the human annotators and Claude. In the selected article, the material is frequently attributed to external sources connected to politics and universities. The credibility of these resources is occasionally signalled by the important responsible positions they hold, which was identified as graduation (e.g., specificity or lexically infused intensification) with attitude-evoking potential by the human annotators, such as, for example, chief executive, which may imply positive judgement (the same for European Commission president). When all three AI tools were asked about such coding, they agreed with it, with ChatGPT and MS Copilot referring to it as social esteem and capability. ChatGPT even specified that judgement was implicit, stating that the given title was “implicitly judging her as someone with responsibility, credibility, and a mandate to speak on behalf of Universities UK.” Additionally, Claude identified it also as graduation, like the 4 Graduation, for example, has a potential to evoke certain attitudes and values in advertising (Križan 2016). 45LANGUAGE human annotators. This shows again that human assistance is often needed to obtain a more accurate and clear-cut final appraisal analysis. Furthermore, by accepting the coding decisions provided or suggested by the human annotators, it appeared that all three AI tools did learn from subsequent prompting and included this knowledge in their subsequent analyses. When, after additional prompting, MS Copilot included graduation and engagement in the analysis of appraisals, various elements were excluded from the analysis: counter-expectancy but, denial not, prefixes dis- and un-, the quoted material correction mechanism/central, reporting verbs believes/says/added/expected/idea of, modality might/would/would have had to/ seemed/possible, and because as reason. Likewise, ChatGPT did not provide the above listed elements as examples of engagement. However, contrary to MS Copilot, ChatGPT referred to individual graders in explanations provided next to the whole propositions (examples). Although denials, including the above-mentioned prefixes, were also excluded from Claude’s initial analysis of engagement, its analysis included the reporting words believes, says, and shows. After dealing with the whole text, subsequent prompts focused on the identification of appraisals in individual sentences instead of in the whole text to observe any difference in responses pertaining to text length. What was noticeable immediately were highlighted individual words or short phrases, which were often absent from the analysis of the entire text. (7) But as Keir Starmer prepares for his first bilateral meeting with the European Commission president, Ursula von der Leyen, on Wednesday, British universities say they are determined not to provoke a return to the ‘toxic’ Brexit row migration and are adopting a ‘watch and wait’. Moreover, both MS Copilot and the human annotators identified the phrase determined not to provoke as positive judgement in (7). However, MS Copilot also identified it as engagement, although it was unclear what exactly this engagement referred to: the determination, provoking, or the use of the denial closing space for alternative views. The human annotators also identified the denial not as engagement, but not determined as graduation, like Claude and MS Copilot, which may be regarded as such if unpacked as decision + firm. As the analysis showed, there were cases where the AI tools accepted a human coding decision or suggestion, as well as cases where the opposite occurred. Although deciding if a word is semantically infused with intensification can be difficult, the unpacking of a word into ___ + more, as well the use of dictionaries, was helpful in many cases. Since dictionaries are part of the internet where MS Copilot searches for information, it seemed obvious why it recognised determined as graduation. Based on the intensified decision, paired with the denial of provoking, the human annotators coded the phrase determined not to provoke as judgement, as did MS Copilot and Claude, whereas ChatGPT identified it as implicit affect. However, intense determination not to do something harmful points to positive tenacity/ propriety (judgement) rather than feelings. Additionally, the human annotators and Claude identified but in (7) and (8) as engagement. Such coding was also accepted by ChatGPT and MS Copilot following a more specific prompt after it was absent from their analysis. This was surprising because but points to the author’s strong presence in the text. In contrast to all three chatbots, the human annotators also identified the fact as engagement in (8). 46 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans? (8) We really, really regret the fact we have lost a flow of really good European students into the UK” said the chief executive of Universities UK, Vivienne Stern. But she said she recognised the “toxic” domestic politics surrounding the prospect of EU citizens returns at scale to education in the UK. (9) “We also get a tiny bit uncomfortable when you think that something which is extremely important to us might be bound up in big politics.” Moreover, the phrase we also get a tiny bit uncomfortable in (9) was regarded as engagement by MS Copilot, which explained it in an unclear way, i.e., that it “acknowledge[s] the speaker’s feelings and allowing for other viewpoints”. What is meant by the acknowledgment of feelings? Does this refer to the alignment of feelings between the author of the text and the external source, hence opening up space for alternative views? The human annotators identified the whole proposition as material attributed to the external source. Despite MS Copilot’s claim that the analysis was detailed, this could not be viewed as such, since certain appraisals identified by the human annotators, such as positive appreciation (important) and engagement (might), were absent from MS Copilot’s analysis. Claude identified might as hedging in the initial analysis, listing hedging next to force and focus under graduation, which was unusual since attitudes can only be graded in force or focus. It is true that hedges can be used to express degrees of certainty and uncertainty, but this is categorised as engagement (entertain) by Martin and White (2005, 98). The human annotators identified might as engagement. However, when the analysis focused on individual sentence (9), Claude identified might as engagement. Furthermore, the human annotators identified also as conveying addition as graduation in the sense of upgrading the negative emotions felt around the issue of Erasmus exchanges. Moreover, Claude identified recognised as graduation and positive judgement of Stern’s diplomatic stance in (8), which the human annotators, ChatGPT, and MS Copilot overlooked. While such coding is certainly reasonable, it can only be implicit judgement, which Claude did not state overtly (in an example before recognised, Claude used implies to signal implicitness). Unlike the human annotators, none of the AI tools identified the prefix -un as engagement (not comfortable = denial). (10) “It’s not in our interest for the government to end up caught in a kind of toxic debate about immigration domestically, because in the end that is going to hurt us badly if it drives government to be clamping down on immigration in other ways,” she said. While ChatGPT, MS Copilot, Claude, and the human annotators identified the phrase hurt us badly in (10) as negative affect based on feelings that universities will experience if the debate forces the government to stop immigration in other ways, Claude also identified it as negative judgement targeting “the potential governmental consequences” and thus the government, which also seems a reasonable coding. Moreover, MS Copilot identified the phrase toxic debate as negative judgement, like Claude in the initial analysis, evaluating the debate (harmful and undesirable), whereas the human annotators evaluated it as negative appreciation targeting the debate as an inanimate entity and initial target. Finding the initial target can be an essential element in ascribing categories (Thompson 2014, 58). Although it is obvious that the toxic debate was produced by politicians, and thus connected to behaviour, 47LANGUAGE judgement is implied rather than inscribed. In (10), the human annotators also identified other appraisals that were absent from MS Copilot’s analysis, such as about, kind of, and other ways as graduation, and because, not, and if as engagement, with the latter two identified as such also by ChatGPT. Claude, like the human annotators, identified (about) immigration as graduation in terms of specificity. Interestingly, kind of was identified as engagement by Claude and ChatGPT, although describing its softening characterisation. Moreover, when asked specifically about the engaging nature of because, all three chatbots accepted such coding. Moreover, with ChatGPT’s and Claude’s identification of domestically as graduation as focus, specificity as a source of graduation was likely acknowledged. Additionally, the whole attributed material was identified as engagement by the human annotators, whereas MS Copilot identified only the phrase it’s not in our interest within the attributed material as engagement. MS Copilot’s explanation that the phrase expresses certainty and thus closes the dialogic space for alternative interpretations is unclear as to what certainty here means. The denial not in the phrase does show the speaker’s engagement, but the reference to certainty remained unclear. (11) Speaking in New York on Friday, Starmer seemed to have softened his resistance to the idea of a youth mobility scheme allowing under-30s to return to the EU for working holiday stints. In (11), seemed to as an important engagement element opening up space for alternative views was absent from ChatGPT’s and MS Copilot’s analyses, but not from the human annotators’ and Claude’s. The Russian doll effect (Thompson 2014) was observed in Claude’s identification of softened his resistance as both explicit positive judgement and implicit negative judgement targeting political flexibility and previous rigid stance, respectively. Additionally, Claude, like the human annotators, identified under 30s and working holiday as graduation, but not EU, as the human annotators did based on location. Claude and the human annotators, unlike ChatGPT and MS Copilot, also identified softened as graduation, as it clearly downgrades the intensity lexically. The human annotators’ coding of idea as engagement was absent from ChatGPT’s, MS Copilot’s, and Claude’s analyses, although it clearly introduces an external source. Interestingly, Claude identified the idea in another sentence as engagement, although both cases indicate the attributed material to the external source. Moreover, Claude’s identification of youth mobility scheme as positive appreciation seems fuzzy since it does not clearly indicate its implicit realisation as it occasionally does in brackets or by using lexis that expresses implication. (12) UK universities urge government to restart flow of EU students after Brexit Universities In (12), the human annotators coded urge as positive affect next to engagement because of its likely connection with the desire that UK universities have for the reinstatement of student exchanges, whereas ChatGPT, MS Copilot, and Claude did not code it as such. Furthermore, MS Copilot immediately refuted such coding when challenged via a subsequent prompt, 48 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans? whereas ChatGPT agreed with it. Claude identified it as “positive judgment of universities’ proactive stance,” which may be reasonable, but more implicit than explicit in nature. Since no indication of implicitness is provided by Claude, as mentioned above, such attitudes should be regarded as explicit by users. According to deVore (1949), urge can be connected to emotions when it is stimulated by some contact within an environment, and the UK universities are stimulated in terms of wanting to attract as many EU students as possible. Since the appraisal system relies heavily on context, the evidence of the annotators’ exactness can be clearly seen in this example, while a failure to give precise and direct answers can be detected in the AI response. ChatGPT’s reply was even more human-like, acknowledging the thoughtful question by the annotators. However, surprisingly, when MS Copilot was prompted again to explain why urge could not imply affect in terms of wish or desire, it agreed with such coding. If MS Copilot had not been asked for clarification, the user would have accepted the first analysis, missing out on other possible interpretations and coding, which would have diminished the thoroughness of the analysis. When asked if urge was explicit or implicit affect, it said it was explicit, hence making this coding compatible with that by the human annotators. In contrast, ChatGPT acknowledged its implicit realisation. When Claude was asked if urge could be affect, it still insisted on judgement as a better choice than affect, providing an explanation via contrasting both categories. However, after subsequent prompts clarifying the meaning of urge as wish or desire, Claude finally accepted its coding as implicit affect. Interestingly, when the same prompt was used again later, Claude identified urge as explicit affect (the same happened with hopeful). This may be a good illustration of the AI’s learning nature. Furthermore, the AI tools’ identification of urge as graduation, which the human annotators overlooked, was reasonable since it can be unpacked as desire + strong. On the other hand, certain instances were identified as graduation, such as after Brexit, UK, and EU (quantification as time and extent), by the human annotators, but not by MS Copilot. Claude identified EU students as graduation in terms of specificity, which signals the broader knowledge of graduation. MS Copilot also coded restart as graduation via quantification, whereas the human annotators coded it as graduation via intensification because of the repetition conveyed via the prefix re-. Although Claude also identified restart as graduation, its explanation that it “precisely frames the desired action” is opaque. It was further observed that ChatGPT referred to the phrase restart flow of EU students after Brexit as an implicit attitude, but listed it as an explicit one, which was extremely confusing. (13) Up to now, most of the focus on reviving post-Brexit opportunities for young people has been focused on an EU proposal in April for a youth mobility scheme that would allow under-30s to study or work abroad for a limited number of years. In (13), the human annotators identified the same graders as all three AI tools (e.g., limited number of years, most of), adding also post-Brexit, focused, focus, and abroad as graduation (quantification via time and place, and focus specificity). While MS Copilot identified in April as graduation via specificity, the human annotators identified it as graduation via quantification (time). Interestingly, MS Copilot and Claude identified up to now as engagement, although it was clearly graduation (quantification as time/extent). Moreover, 49LANGUAGE Claude identified reviving as implicit negative appreciation targeting the current state that needs a revival, as well as implicit positive appreciation targeting “potential future opportunities,” which is unclear, whereas the human annotators coded it as explicit positive appreciation based on its denotational meaning to bring something back to life. (14) But as Keir Starmer prepares for his first bilateral meeting with the European Commission president, Ursula von der Leyen, on Wednesday, […] In contrast to all three chatbots, the human annotators identified implied positive judgement targeting Ursula von der Leyen’s position in (14). Furthermore, upon a plausible explanation by ChatGPT, pointing to Starmer’s diligence and readiness indirectly via his preparation, the coding of this proposition as implicit positive judgement became obvious. Claude, on the other hand, coded bilateral as implicit positive appreciation based on the political discourse (context) in which such meeting is of significance and carries diplomatic weight. With this, Claude showcased strong awareness of context. Moreover, since the European Commission president, Ursula von der Leyen was identified as non-attitudinal by MS Copilot, why did it categorise it as explicit appreciation? Since Ursula von der Leyen’s role was identified as the target of evaluation by MS Copilot, then could it also have been identified as judgement? Unfortunately, the user did not ask for clarification, which points again to the problem of obtaining sufficient prompting, which may not only be time-consuming but may also require elevated levels of creativity and exactness in forming prompts. 5 Conclusion The paper compares the analysis of appraisal performed by ChatGPT, MS Copilot, and Claude to that performed by humans in a selected newspaper article. By applying a qualitative research method, the paper provides important insight into differences and similarities in the identification and annotation of appraisals based on the systematic and fully developed evaluative model by Martin and White (2005). While humans rely on context and subjective/human experience alongside knowledge of appraisal theory to analyse the text in terms of appraisals, AI models depend on pre-trained datasets to approximate these functions. ChatGPT’s responses can be extensive, yet they often lack grounding in facts, making it obvious that repetition of previous answers or sentences occurs. MS Copilot’s responses can be less conversational, focusing more on providing straightforward answers, although occasionally even more conversational than ChatGPT. ChatGPT and MS Copilot generally selected broader phrases for analysis rather than individual appraisal instances, without highlighting the explicit element responsible for evaluation, which was occasionally confusing. In contrast, Claude focused on individual instances. All three chatbots showcased adaptability when prompted for clarification, often accepting coding suggestions and refining their analysis accordingly. ChatGPT provided similar answers to MS Copilot, yet after the generated answer it added systematic summaries (sometimes lengthy and repetitive) offering a useful structured reflection of the analysis. On the other hand, this tendency to prolong the answer could be interpreted as an attempt to make the analysis seem precise. All three chatbots provided better answers when dealing with smaller sections of text, although even in such cases subsequent prompting was often necessary. Additionally, 50 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans? consistency was occasionally problematic after subsequent prompting, which is in line with the research by Imamović et al. (2024). Moreover, since certain explanations next to examples were ambiguous, a closer (and time-consuming) examination of these was required. All three AI tools demonstrated certain discrepancies in categorizing attitude, graduation, and engagement, with ChatGPT and Claude showing greater alignment with the human annotators in identifying implicit attitudes. Additionally, discrepancies were also observed in the identification of attitudinal status and realisation. All three chatbots also often lacked consistency in identifying implicit versus explicit attitudes, often requiring human intervention for accuracy. Based on ChatGPT’s answers, however, the authors noticed a recurring pattern. If ChatGPT provided information about a phrase or a sentence being expressed implicitly or explicitly, it usually did not provide information about whether the phrase or sentence was expressed positively or negatively, which was also noticed in some responses from Claude. The human annotators identified more appraisal instances overall when dealing with the whole text, often pinpointing nuances that were overlooked by the AI tools, such as specific instances of engagement and graduation. When the focus was specifically on implicit attitudes, Claude, particularly, often identified more attitudes than the human annotators. While a closer study of some of those attitudes showed that their identification was useful for the human annotators, it also pointed to potential over-analysis, especially since Claude, as it admitted, went beyond the theory of appraisal by including knowledge of pragmatics, sociolinguistics, and critical discourse analysis, which is reasonable and welcome. However, with appraisal being the focus of the study, this could be problematic since implicitness and evaluation in literature may encompass various elements. The findings suggest that while AI tools, such as ChatGPT, MS Copilot, and Claude can provide valuable insight into appraisal via rapid responses, they cannot entirely replace human annotators in capturing the complexity of the evaluative language in terms of accuracy in the identification of appraisals and implicit attitudes, along with consideration of context, which can be beneficial for appraisal research. This echoes the conclusions drawn by Hazemali et al. (2024), where the GPT-3.5 powered PDFGear Copilot exhibited competence in retrieving explicit information but struggled with in-depth interpretative tasks. Given the parallels between historical document analysis and appraisal research, it is evident that AI chatbots require human oversight to ensure accurate and contextually appropriate linguistic analysis. In other words, so far, sole reliance on AI tool chatbots for accurate and fine-grained analysis of appraisal is insufficient, and human assistance is indispensable. This study is based on a single text, so the results may not be completely generalizable across genres, datasets, and linguistic contexts. Additionally, the results may also be affected by previously published appraisal analyses, accessible to AI, by situational and cultural context, and the use of linguistic sources (co-text). For example, human and AI analyses of more factual texts which deploy mainly explicit attitudes and less authorial intervention might be more in sync than those that are rich in figurative language or allow for greater variety of interpretation. 51LANGUAGE Although the current study refrains from generalisation, owing to its exploration of a single text, it lays the foundation for future research that could further explore appraisal coding by utilizing a larger database, a variety of media outlets or other AI chatbots such as Gemini Perplexity, Qwen, and DeepSeek. Future research could also investigate any subjectivity and/ or (non)bias in implicit attitudes when these are identified by the AI tools, since human analysts should strive to adopt as neutral a reading position as possible, although, according to Martin and White (2005, 207), undesirable subjectivity cannot be entirely excluded from human appraisal analysis. References Crosthwaite, Peter, and Vit Baisa. 2023. “Generative AI and the end of corpus-assisted data-driven learning? Not so fast!” Applied Corpus Linguistics 3 (3): 100066. https://doi.org/10.1016/j.acorp.2023 .100066. Curry, Niall, Paul Baker, and Gavin Brookes. 2024. “Generative AI for corpus approaches to discourse studies: A critical evaluation of ChatGPT.” Applied Corpus Linguistics 4 (1): 100082. https://doi.org /10.1016/j.acorp.2023.100082. deVore, Nicholas. 1949. “The urges and the emotions”. In New Frontiers of Psychology, by Nicholas deVore, 42–53. Philosophical Library. https://doi.org/10.1037/13248-006. Halliday, Michael A.K., and Christian M.I. Matthiessen. 2004. An Introduction to Functional Grammar. Hodder Arnold. Hazemali, David, Janez Osojnik, Tomaž Onič, Tadej Todorović, and Mladen Borovič. 2024. “Evaluating chatbot assistance in historical document analysis.” Moderna arhivistika 7 (2): 53–83. https://doi.org /10.54356/ma/2024/biub3010. Hood, Susan. 2004. “Appraising research: Taking a stance in academic writing.” PhD diss., University of Technology, Sydney. http://www.grammatics.com/appraisal/suehoodphd/hood_title_page.pdf. Hunston, Susan, and Geoff Thompson, eds. 1999. Evaluation in Text: Authorial Stance and the Construction of Discourse. Oxford University Press. Imamović, Mirela, Silvana Deilen, Ekaterina Lapshinova-Koltunski, and Dylan Glynn. 2024. “Using ChatGPT for annotation of attitude within the appraisal theory: Lessons learned.” In Proceedings of the 18th Linguistic Annotation Workshop (LAW-XVIII), edited by Sophie Henning and Manfred Stede, 112–23. Association for Computational Linguistics. Koeva, Svetla. 2024. “Large language models in linguistic research: The Pilot and the Copilot.” In Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024), 319–28. Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences. https://aclanthology.org/2024.clib-1.35/. Kress, Gunther. 1988. Communication and Culture: An Introduction. New South Wales University Press. Križan, Agata. 2016. “The Language of appraisal in British advertisements: The construal of attitudinal judgement.” ELOPE: English Language Overseas Perspectives and Enquiries 13 (2): 199–220. https://doi .org/10.4312/elope.13.1.15-29. Leech, Geoffrey. 1993. “Corpus annotation schemes.” Literary and Linguistic Computing 8 (4): 275–81. Lozić, Edisa, and Benjamin Štular. 2023. “Fluent but not factual: A comparative analysis of ChatGPT and other AI chatbots’ proficiency and originality in scientific writing for humanities.” Future Internet 15 (10): 336. https://doi.org/10.3390/fi15100336. Macken-Horarik, Mary. 2003. “appraisal and the special instructiveness of narrative.” Text - Interdisciplinary Journal for the Study of Discourse 23 (2): 285–312. https://doi.org/10.1515/text.2003.012. Martin, Jim R., and Peter Robert Rupert White. 2005. The Language of Evaluation: Appraisal in English. Palgrave Macmillan. Orel Kos, Silvana. 2024. “Introduction of machine translation into audiovisual translation teaching.” ELOPE: English Language Overseas Perspectives and Enquiries 21 (1): 185–208.  https://doi .org/10.4312/elope.21.1.185-208. 52 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans? Page, Ruth E. 2003. “An analysis of appraisal in childbirth narratives with special consideration of gender and storytelling style.” Text 23 (2): 211–37. https://doi.org/10.1515/text.2003.012. Shahriar, Sakib, and Kadhim Hayawi. 2023. “Let’s have a chat! A conversation with ChatGPT: Technology, applications, and limitations.” Artificial Intelligence and Applications 2 (1): 11–20. https://doi.org/10 .47852/bonviewaia3202939. Thompson, Geoff. 2008. “Appraising glances: Evaluating Martin’s model of APPRAISAL.” Word 59 (1–2): 169–87. https://doi.org/10.1080/00437956.2008.11432585. —. 2014. “AFFECT and emotion, target-value mismatches, and Russian dolls: Refining the APPRAISAL model.” In Evaluation in Context, edited by Geoff Thompson and Laura Alba-Juez, 47–66. John Benjamins. Tica, Lena, and Ivana Krsmanović. 2024. “Overcoming the writer’s block? Exploring students’ motivation and perspectives on using ChatGPT as a writing assistance tool in ESP.” ELOPE: English Language Overseas Perspectives and Enquiries 21 (1): 129–49. https://doi.org/10.4312/elope.21.1.129-149. Uchida, Satoru. 2024. “Using early LLMs for corpus linguistics: Examining ChatGPT’s potential and limitations.” Applied Corpus Linguistics 4 (1): 100089. https://doi.org/10.1016/j.acorp.2024.100089. White, Peter Robert Rupert. 2000. Functional Grammar. Centre for English Language Studies, University of Birmingham. Primary Source O’Carroll, Lisa. 2024. “UK universities urge government to restart flow of EU students after Brexit.” The Guardian, September 30. https://www.theguardian.com/education/2024/sep/30/uk-universities-urge -government-to-restart-flow-of-eu-students-after-brexit Academic Writing Part III 55ACADEMIC WRITING 2025, Vol. 22 (1), 55-68(228) journals.uni-lj.si/elope https://doi.org/10.4312/elope.22.1.55-68 UDC: [811.111’243:378]:004.89 Silvana Neshkovska University “St. Kliment Ohridski”, Bitola, North Macedonia The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research ABSTRACT This paper explores the transformative role of Artificial Intelligence (AI) tools, specifically ChatGPT, in the acquisition of English as a foreign language. With the rapid evolution of educational technology, AI-driven chatbots like ChatGPT offer innovative methodologies to augment language teaching and learning. This study examines the potential of ChatGPT to improve English language students’ writing abilities by providing suggestions, corrections and automated assistance. Through a review of existing literature and a discussion of the findings of recent studies, the paper seeks to highlight the benefits and risks of integrating AI tools into language education, especially, in the context of writing. Insights gained from multiple studies suggest that while ChatGPT has the potential to significantly enhance language students’ writing skills in all phases of writing, by promoting engagement, motivation, and autonomy among learners, it also necessitates cautious use to ensure academic integrity and to prevent over-reliance, which in turn, can stifle students’ learning capacities. Keywords: writing, EFL, AI, ChatGPT, benefits, risks Prednosti in tveganja pri znanstvenem pisanju s pomočjo umetne inteligence: spoznanja iz aktualnih raziskav IZVLEČEK Prispevek raziskuje, kako so orodja umetne inteligence (UI), zlasti ChatGPT, preoblikovala učenje angleščine kot tujega jezika. S hitrim razvojem izobraževalne tehnologije pogovorni sistemi, kot je ChatGPT, ponujajo inovativne pristope za nadgradnjo poučevanja in učenja jezikov. Študija preučuje potencial orodja ChatGPT za izboljšanje pisnih spretnosti študentov in študentk angleščine s pomočjo predlogov, popravkov in samodejne pomoči. Na podlagi pregleda obstoječe literature in analize ugotovitev nedavnih raziskav prispevek osvetljuje prednosti in tveganja pri vključevanju orodij umetne inteligence v učenje jezikov, še posebej na področju pisanja. Ugotovitve številnih študij kažejo, da lahko ChatGPT bistveno izboljša pisne zmožnosti študentov in študentk v vseh fazah procesa pisanja, saj spodbuja njihovo vključenost, motivacijo in samostojnost. Kljub temu pa njegova uporaba zahteva premišljeno rabo, saj je treba zagotoviti spoštovanje akademske integritete in preprečiti pretirano zanašanje na tehnologijo, kar bi lahko zavrlo razvoj učnih sposobnosti. Ključne besede: pisanje, angleščina kot tuji jezik, umetna inteligenca, ChatGPT, prednosti, tveganja 56 Silvana Neshkovska The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research 1 Introduction Writing is a fundamental language skill that foreign language learners must acquire, yet it remains one of the most challenging aspects of language acquisition. This is particularly true for academic writing, a critical skill that students must master at the university level (Yang 2024; Özçelik and Ekşi 2024; Malá, Brůhová, and Vašků 2022). With the advent of artificial intelligence (AI), foreign language acquisition is undergoing a profound transformation. There is an ample body of literature showing that teaching and learning practices are being fundamentally reshaped, and this shift extends to the domain of writing as well. AI, particularly through chatbots like ChatGPT, has introduced a new dimension to the development of writing skills. On the one hand, it offers significant opportunities for enhancing students’ writing proficiency; on the other hand, it presents risks and challenges that may disorient students and seriously undermine their academic growth and performance (Nguyen, Ngoc, and Dan 2024; Imran and Almusharraf 2023; Masoudi 2024; Briggs 2018; Mun 2024, etc.). Many of the studies apart from emphasizing the advantages of AI in language learning (Jazbec 2024), also throw light on student perceptions of and experiences with the use of AI in this context (Mahapatra 2024; Rahmi et al. 2024; Artiana and Fakhrurriana 2024; Mun 2024; Özçelik and Ekşi 2024; Nguyen, Ngoc, and Dan 2024; Song and Song 2023; Tica and Krsmanović 2024; Khampusaen 2025, etc.). Drawing on recent studies conducted in various parts of the world, this paper aims to highlight the practical implications of using a specific AI-driven tool, ChatGPT, in foreign language classrooms. More precisely, by reviewing existing literature on AI-assisted academic writing, this study explores the potential strategies for effectively utilizing ChatGPT in completing academic writing assignments. It examines how language students can leverage such technologies to enhance their writing skills, improve efficiency, and receive personalized support. At the same time, the study considers the risks and implications that the incorporation of such technologies might have on students’ academic well-being. Lastly, by reviewing the findings obtained from recent research, this study attempts to shed some light on student perceptions on the use of ChatGPT in academic writing. 2 Theoretical Background 2.1 Academic Writing in the Context of EFL Writing is often characterised as the most challenging of the four language skills for second- language learners (Richards and Renandya 2002; Hyland 2003; Tica and Krsmanović 2024). This view is widely supported by researchers, teachers (Hyland 2003) and language students (Byrne 1993, in Tran 2024). Writing proficiency is often seen as a key factor for success in exams, recruitment tests, and general social standing (Dastgeer, Afzal, and Atta 2021, in Nguyen, Ngoc, and Dan 2024, 171). More specifically, writing serves as a crucial prerequisite not only in education but also in personal and professional endeavours (Yang 2024; Özçelik and Ekşi 2024) because it promotes communication, enhances thinking skills and encourages reflection among students (Klimova 2012, in Özçelik and Ekşi 2024). 57ACADEMIC WRITING However, viewed from another perspective, the complex cognitive processes that underlie writing make this process extremely challenging for foreign language learners. Students are required to produce, arrange, and transform their thoughts, opinions, attitudes, and feelings clearly and coherently in written form (Richards and Renandya 2002). According to Nunan (2003, 88), writing is “the mental work of inventing ideas, thinking about how to express them, and organising them into sentences and paragraphs that will be clear to a reader.” Thus, proficient English writing abilities necessitate not only a comprehensive understanding of a language – an extensive lexicon, appropriate word selection, grammatical principles, punctuation and spelling rules, but also knowledge of layout conventions, sentence and paragraph organisation, and appropriate register and style use (Nguyen, Ngoc, and Dan 2024; Özçelik and Ekşi 2024; Sari and Agustina 2022). Similarly, Ferris (2018) emphasises that effective academic writing involves both an advanced grasp of linguistic aspects (e.g., vocabulary, spelling, grammar, cohesive devices, punctuation, capitalization, and formatting), and sufficient knowledge of extra-linguistic features (e.g., the content and the context of writing, the purpose of writing and the audience). According to Mun (2024), an additional factor that complicates matters further is the time limitation that normally accompanies academic writing assignments. Because of time constraints, students lose the motivation to fully invest themselves in the writing process, which, in turn, seriously hinders the development of their writing abilities. Clearly, academic writing (irrespective of its format – essays, reports, studies, etc.) is not just a matter of linguistic competence; it requires broader socio-cultural and world knowledge. Taking all of this into consideration, it is unsurprising that many tertiary-level students find writing assignments daunting (Artiana and Fakhrurriana 2024; Khatter 2019; Rahmat et al. 2017). As Campbell (2019) (in Rahmi et al. 2024) rightfully points out, academic writing in English is a complex and integrative task, not only for international students but for native speakers, as well. 2.2 AI in Education, Foreign Language Acquisition and Writing Recent years have seen a visible surge in AI-powered tools, which have left an indelible mark on several sectors, including education. These novel versatile tools can perform multiple functions, and, consequently, are seen as promising resources that can enhance student learning (Nazari et al. 2021, in Rahmi et al. 2024). Their capacity to exhibit human-like behaviour and cognitive abilities, including learning, self-correction, adaptation, reasoning, problem-solving, decision-making, and language comprehension, make them especially beneficial in educational environments (Shidiq 2023, in Artiana and Fakhrurriana 2024; Popenici and Kerr 2017, in Rahmi et al. 2024). Chatbots are a special type of AI-driven tool that is particularly advantageous in foreign language acquisition (Nguyen, Ngoc, and Dan 2024; Batanero et al. 2021 in Tran 2024). Researchers outline a long list of distinct benefits to using chatbots in language learning contexts: the creation of a relaxed learning environment; heightened student motivation; enhanced student enjoyment; reduced language anxiety; access to diverse learning resources; immediate and effective feedback on spelling and grammar; facilitation of reading and listening practice, and the provision of patient conversation partners (Fryer and Carpenter 2006, 9–10). Also, these AI tools are credited with reinforcing students’ sense of autonomy and engagement (Yang 2024); their creative and critical thinking; problem-solving capabilities 58 Silvana Neshkovska The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research (Karataş et al. 2024; Kasneci et al. 2023), and for enlarging students’ vocabulary (Kohnke, Moorhouse, and Zou 2023). Writing skills have also been significantly impacted by the application of these technological advances in the foreign language classroom (Kasneci et al. 2023). Purcell et al. (2013) (in Rahmi et al. 2024), in that respect, purport that the positive influence of these digital technologies on students’ writing production extends to both non- native and native English users. Among the AI-driven tools, ChatGPT holds the place of honour. Released in November 20221, ChatGPT is a type of Large Language Model (LLM) that has changed the education scene immensely (Nguyen 2023). This text generation tool rapidly reached over 100 million users and attained a market-leading position (van Dis et al. 2023; Peachey 2023; Hu 2023; Dobrin 2023). Although ChatGPT is not the first nor the only AI-driven chatbot, still, what sets it apart from other chatbots is that it was pre-trained based on a vast corpus of human- generated texts, because of which it is excellent at using natural language and generating highly human-like texts (Yang 2024; Anderson 2023, in Jen and Salam 2024). In fact, because of all the texts to which it was exposed during training, it generates immediate responses to text- based instructions provided by the user (“prompts”) (Hellstrom 2024). Depending on the prompts it receives, it can provide answers to questions and can generate different kinds of text (Farina and Lavazza 2023, 2), ranging from social media posts, to emails, blog articles, and overviews of research studies; it can also produce summaries, inferences, comparisons, sentiment analysis, and translations to other languages (Hellstrom 2024, 2). It handles with ease follow-up questions, acknowledges mistakes, challenges incorrect assumptions, refuses inappropriate requests, and, most importantly, with ongoing human input, it continuously improves its performance (Masoudi 2024, 64). Research shows that this AI tool, through its advanced algorithms and natural language use, has significant potential to improve students’ writing ability by offering grammar corrections, suggestions, and comprehensive feedback (Osorio 2023, in Masoudi 2024, 65), i.e. by procuring ideas as well as final proofreading and editing of written material (Imran and Almusharraf 2023, 2). A crucial factor contributing to its widespread use in education is that today’s students, as digital natives, are accustomed to technology in their daily lives (Briggs 2018; Mun 2024), and they find the use of this tool to be uncomplicated and straightforward. In the following sections, we will explore the benefits and risks of incorporating ChatGPT in academic writing as well as students’ perceptions related to this issue by discussing the findings and insights gained from several recent studies that have undertaken the exploration of this issue in diverse academic backgrounds. 3 Review of Recent Research 3.1 The Benefits and Risks of Incorporating ChatGPT in Academic Writing Although some researchers claim that there is a serious lack of comprehensive empirical research confirming ChatGPT’s immense potential in augmenting language learners’ skills 1 ChatGPT was initially released by OpenAI in 2018. The significant advances in the model, however, led to the release of the ChatGPT-3.5 model in November 2022, and the ChatGPT-4 model in March 2023. 59ACADEMIC WRITING (Barrot 2023, in Mun 2024; Nguyen, Ngoc, and Dan 2024; Artiana and Fakhrurriana 2024; Yang 2024, Özçelik and Ekşi 2024; Su et al. 2023, in Mahapatra 2024), there is still no denying that the number of studies dealing with this issue and contributing to this discussion has been growing exponentially in recent years. The findings of a vast pool of recent studies point to the fact that, if used appropriately, this large generative language model can immensely and genuinely improve students’ writing capabilities (Sawangwan 2024; Mun 2024; Khampusaen 2025). This AI-driven tool has been labelled a real game-changer in language education, primarily because it is very student-friendly and can provide more need-based or personalised assistance than similar tools (Rudolph, Tan, and Tan 2023, 350). More specifically, its real expertise in the context of writing lies in its ability to respond to user queries regarding various aspects of writing by offering suggestions, functioning as a support-on-demand tool, admitting mistakes and rectifying itself (Mahapatra 2024, 3). In essence, its main advantage is that it supports student writing by providing directions related to both the content and organisation of the writing assignment at all phases of writing (Chan and Hu 2023). In the pre-writing phase, ChatGPT alleviates the process of writing (Stokel-Walker 2022, in Mahapatra 2024, 3), primarily by generating ideas (Lingard 2023, in Mahapatra 2024, 3). In fact, ChatGPT serves as “an invaluable writing assistant which offers prompt responses and assists in brainstorming sessions” (Nguyen, Ngoc, and Dan 2024, 182) by generating new ideas for writing assignments, suggesting “topics, themes, and perspectives that they might not have considered otherwise” (Kasneci et al. 2023; Taecharungroj 2023, in Imran and Almusharraf 2023, 3), or by expanding upon users’ topics, presenting new aspects of their ideas, or providing contextually relevant suggestions (Bhatia 2023, in Nguyen, Ngoc, and Dan 2024, 182). All of these ‘interventions’ aid students “in overcoming their initial writer’s block, and in fostering their creativity, during the initial stages of writing” (Nguyen, Ngoc, and Dan 2024, 182). After the completion of the pre-writing stage, ChatGPT can be employed to provide corrective feedback (Dai et al. 2023, in Mahapatra 2024) on text organisation, especially on the logical organisation of content and thoughts, the addition of appropriate supporting details, the inclusion of suitable concluding remarks (Fitria 2023), the provision of logical connections between paragraphs (Nugroho, Putro, and Syamsi 2023), and the enhancement of writing mechanics (spelling errors, capitalization, or punctuation) (Zirar 2023). During the actual process of writing, ChatGPT’s corrective feedback can also target language use and grammar (Nguyen 2023) as well as vocabulary (Wang and Guo 2023). In other words, ChatGPT can provide access to grammar materials on various topics such as tenses, active and passive sentences, gerunds, infinitives, syntactic structure of sentences etc. It can also suggest appropriate vocabulary choices by providing synonyms and alternatives for words and phrases. This can be extremely helpful for non-native English speakers in their quest to express their ideas (Huang and Tan 2023, 1150–51). ChatGPT can work as “an alternative to dictionaries and model more advanced use of foreign learning” in the context of writing (Mun 2024, 27). Furthermore, during the writing phase, this chatbot can also be used to ensure that students are using the appropriate style and tone for their specific writing assignment 60 Silvana Neshkovska The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research (Hellstrom 2024). Namely, ChatGPT can improve “the formality and clarity of their writing, ensuring a more accurate presentation of their ideas” (Nguyen, Ngoc, and Dan 2024, 184). In the revision phase, language students can utilize ChatGPT for editing and proofreading purposes. While editing is mostly concerned with clarity and concision, and correcting wordiness of text, proofreading targets final polishing of verb constructions, punctuation, grammar, and spelling (Diamond and Allen 2024; Dobrin 2023). In addition to these features – generation of ideas, assistance in content and structure organisation and language editing and proofreading, ChatGPT can help detect plagiarism by comparing a given text to existing published sources, thereby verifying its originality and determining whether it has been copied from other works (Huang and Tan 2023). Additionally, ChatGPT can provide “guidance on proper citation formats” and generate “reference entries for various citation styles” (Jarrah, Wardat, and Fidalgo 2023, in Nguyen, Ngoc, and Dan 2024, 184). The only prerequisite for obtaining adequate assistance from ChatGPT is for students to be trained in proper “prompt engineering”, which, basically, stands for putting precise and concise instructions into ChatGPT’s search box (Diamond and Allen 2024; Dobrin 2023). Effective “prompt engineering” is vital at all stages of the writing process (Diamond and Allen 2024; Hellstrom 2024). Well-crafted prompts help in avoiding vague or generic responses, ensure accuracy, and prevent ChatGPT from generating offensive or misleading content. Diamond and Allen (2024), Dobrin (2023), and Skrabut (2023) call for continuous refinement of prompts based on the feedback received. To save time and to enhance the efficiency of all writing phases, students are advised to build a library of specialized prompts to which they constantly refer (Diamond and Allen 2024; Peachey 2023). Given all the abovementioned insights from previous studies, it is safe to conclude that ChatGPT constitutes an invaluable tool capable of providing users with a solid foundation for their writing assignments. When employed effectively, it holds the potential to significantly enhance the academic writing experience of students, by offering both useful guidance and feedback (Raheem et al. 2023, in Nguyen, Ngoc, and Dan 2024, 179). Despite these considerable benefits, students must be consistently reminded that ChatGPT should serve as a supplemental tool – specifically, as a writing assistant – rather than a content creator that diminishes their role or, even worse, entirely replaces their input (Mun 2024; Barrot 2023; Tran Ngan, and Uyen 2025; Nguyen, Ti, and Hoa 2025). Put differently, students should embrace the idea that while machines can help construct good writing, humans are still the main actors controlling the flow in the writing process (Sumakul, Hamied, and Sukyadi 2021). Current research constantly draws attention to the plausible dangers that ChatGPT’s use can pose in the context of academic writing if it is not treated solely as an assistant. Thus, the most obvious negative ramifications of student overreliance on ChatGPT can be reflected in their ability to learn and develop their writing skills, since they could get used to obtaining ready-made texts (Mun 2024). The same goes for their ability to detect and correct their mistakes and to develop their creative and critical thinking skills (Kornfeld and Roy 2021, in 61ACADEMIC WRITING Tran 2024; Nguyen, Ti, and Hoa 2025). Chatbot’s limitations in interpretative and nuanced tasks have also been well-documented. For instance, Hazemali et al. (2024) demonstrated that chatbots often falter when tasked with complex contextual analyses, such as drawing cause-and-effect relationships in historical document reviews. This highlights the need for human oversight to ensure accuracy and depth in academic writing. These genuine threats to learners’ development of critical thinking and writing abilities have impelled a number of teachers and school administrators to perceive ChatGPT as the opening of Pandora’s box (Hong 2023, in Sawangwan 2024, 1). This, in turn, has culminated with some education institutes, in some countries announcing bans on the use of this chatbot altogether (Reuters 2023, in Sawangwan 2024, 1). ChatGPT’s potential to threaten academic honesty and ethical conduct (Yan 2023, in Mahapatra 2024) can be observed in the fact that the factual content generated by ChatGPT is sometimes incorrect, and human control and intervention are required (Hellstrom 2024). In fact, ChatGPT, like the rest of GenAIs, is susceptible to responses that are known as ‘hallucinations’, which, in essence, are false outputs despite appearing correct. These kinds of responses may occur because of a lack of sufficient information, vague or unclear prompts, limited or overly specific data within a language model, or biased datasets. As a result, they might contain incorrect citations, non-existent sources, or entirely fabricated information (Dobrin 2023). Hence, students are advised to always double-check ChatGPT-generated content for accuracy and relevance by consulting reliable resources (Dobrin 2023; Diamond and Allen 2024; Hellstrom 2024; Hazemali et al. 2024; Nguyen, Ti, and Hoa 2025). Lastly, ChatGPT can encourage cheating and plagiarism in some students, especially, those who struggle with writing assignments (Jen and Salam 2024). In the most apocalyptic scenario, its continuous and nonselective use can lead to drastically reduced and changed need for, ability at, and valuation of human writing, or, in other words, can drastically decrease trust in the written word, as it would be difficult to prove whether a text was produced by a human being or a machine (Hellstrom 2024). 3.2 Insights from Previous Studies Regarding Student Perceptions on the use of ChatGPT in Academic Writing In this section, we discuss the findings of a selection of recent studies dealing with the role of ChatGPT in enhancing various aspects of language students’ writing skills as well as students’ perception of ChatGPT’s ‘interference’ with their writing. Nguyen, Ngoc and Dan (2024) investigated Vietnamese students’ perceptions of ChatGPT’s usefulness by conducting a questionnaire and interviews, focusing on eight aspects of writing development: vocabulary, grammar, idea generation, organisation, translation, writing style, plagiarism management, and the mechanics of writing. Student responses revealed a moderately positive attitude towards ChatGPT’s use for writing purposes, with the highest ratings given to idea generation, and then to vocabulary, grammar, organisation, writing style and idea generation, and notably less pronounced interest in using ChatGPT for plagiarism management, translation, and the mechanics of writing. As to the limitations of using 62 Silvana Neshkovska The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research ChatGPT, students voiced concerns about its tendency to produce nonspecific or irrelevant responses, the risk of over-reliance on the tool, and its inability to provide reliable references. Based on these findings, Nguyen, Ngoc and Dan (2024) concluded that ChatGPT both streamlines the writing process by allowing students to upgrade their argumentative writing skills at a fast pace and promotes a more engaging and dynamic approach to language acquisition and composition in general. Similarly, Song and Song (2023) assessed the influence of ChatGPT on the writing abilities and motivation of Chinese EFL students. Using a pre-test and post-test design, they compared the writing skills of 50 students, who were randomly assigned to control and experimental groups. In addition to the tests, semi-structured interviews explored the students’ motivation for and experiences with AI-assisted learning. The results indicated that ChatGPT helped improve vocabulary, grammar, organization, and idea generation in the experimental group in comparison to those receiving traditional instruction. Students also expressed concerns about AI’s accuracy in certain contexts and the dangers of becoming overly dependent on it. Yang’s (2024) empirical study explored the impact of ChatGPT on writing proficiency among Chinese EFL learners. Using a qualitative case study approach, the study included Chinese undergraduate students who participated in semi-structured interviews intended to provide in-depth insight into their experiences with ChatGPT. This study investigated ChatGPT’s impact on students’ writing proficiency, focusing on the planning and revision phase of the writing process, and showed that ChatGPT “aids in planning by helping students think deeply, generate ideas, and organize them coherently” (Yang 2024, 176). Furthermore, the study highlights that “during revision, it provides feedback on grammar, spelling, and structure, refining expressions and producing polished writing” and that “students reported enhanced creative thinking and improved essay coherence and readability” (Yang 2024, 176). Given these results, Yang’s (2024) conclusion is that integrating ChatGPT into writing instruction can effectively enhance students’ writing outcome. ChatGPT’s impact on the acquisition of register knowledge across various writing tasks among undergraduate students in Turkey was explored by Özçelik and Ekşi (2024). The students were asked to complete writing assignments, which were then checked by ChatGPT for corrections and suggestions. The researchers trained students in prompt engineering to help them achieve better results from ChatGPT. The study found that ChatGPT helped students to overcome their initial reluctance to engage in writing tasks. It was particularly useful for acquiring formal register knowledge but less effective for teaching neutral register or informal writing. In Mahapatra’s (2024) study, ChatGPT was examined as a feedback tool for the academic writing skills of undergraduate ESL students in a large Indian university classroom. His mixed-methods intervention involved pre-tests, post-tests, and delayed tests, and Mahapatra established that the employment of ChatGPT as a feedback tool had a substantially positive impact on students’ academic writing proficiency. The students expressed overwhelmingly favourable opinions about the tool, upon which Mahapatra (2024) concluded that ChatGPT can serve as a dependable feedback tool for academic writing assignments. 63ACADEMIC WRITING Mun (2024), on the other hand, conducted a study among Korean EFL college students to understand how they used ChatGPT in essay writing and what their perceptions of its usefulness were. The students were organised into an experimental group and a control group. They were given instructions by the same instructor, used the same course materials and syllabus, and underwent the same examinations. The participants had a pre-test and a post-test, during which they wrote an essay expressing their viewpoints on a selected topic. The participants in the experimental group received instructions for writing adequate prompts and were told to use ChatGPT to individually proofread and revise their drafts. They submitted their second drafts after they had received feedback from ChatGPT, whereas the students from the control group submitted their drafts after receiving peer feedback in class. The findings of this study revealed a highly positive sentiment towards ChatGPT overall, with students perceiving it as a valuable and effective tool for English writing and language learning. They particularly pointed out its ease of use, convenience, and positive impact on grammar, vocabulary, and content organisation. Furthermore, these results indicated significantly improved writing performance among the experimental group of students compared to the control group. More precisely, according to Mun (2024, 36), the students in the experimental group exhibited “enhanced post-test writing quality in both structural and linguistic aspects, which surpassed considerably their pre-test scores”. The perspectives of Indonesian EFL undergraduate students on using ChatGPT in academic writing were explored by Artiana and Fakhrurriana (2024) through a study that included a qualitative approach. This study involved participants who used ChatGPT in their writing assignments, and the data was collected through observation, in-depth interviews and an analysis of academic writing tasks produced by the students. The researchers endeavoured to assess the writing quality, language use, and developmental progress in academic writing among students using ChatGPT as a writing aid. The study revealed that ChatGPT accelerated the writing process, alleviated pressure, and helped students produce more fluent and better structured texts. Students appreciated its assistance with idea organisation and argument construction, as well as its ability to offer alternative suggestions and phrasing options. The integration of ChatGPT into the English language writing curriculum in Thai EFL universities was investigated by Sawangwan (2024). This study found that ChatGPT contributed to making significant improvements in students’ proficiency, which moved from the B1 level to C1, according to the CEFR. Sawangwan (2024) in this study also emphasized the evolving role of teachers as facilitators who guide students in the use of AI tools, by providing technical support, establishing writing criteria, and offering ethical guidance. This shift in the role of teachers from “being completely in charge” to “being mere facilitators,” allows them to focus more on curriculum development and personalized support, ultimately enhancing students’ writing performance (Sawangwan 2024, 14). Rahmi et al. (2024) reported in their study that while Indonesian students generally viewed AI tools like ChatGPT quite favourably, they did note some drawbacks, including the tool’s lack of intentionality and its failure to replicate the nuances of human thought. Students felt that AI-generated text often lacked a “human touch” and could produce content that was predictable, stylistically inconsistent, or irrelevant to the topic. 64 Silvana Neshkovska The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research Another study that underlines serious drawbacks that stem from using AI tools in the context of academic writing is Tran, Ngan and Uyen’s (2025). This study focused on a group of postgraduate students majoring in English in Vietnam and their experiences with AI. Interestingly, these students, in addition to the benefits, which mostly take the form of improved writing skills and immediate support, also underlined serious drawbacks such as experiencing difficulty logging in and signing up for accounts when using AI tools; costly subscriptions and unstable Internet connection; the real danger of becoming overly reliant on AI-generated content, losing one’s thinking and writing skills, and, finally, the challenge of integrating AI-generated texts into one’s own writing while preserving one’s academic voice (Tran, Ngan, and Uyen 2025, 87). Although this section does not provide a comprehensive overview of all current studies on AI-assisted writing, the available findings indicate that EFL students from diverse academic backgrounds around the world generally express positive attitudes towards the integration of AI tools – particularly ChatGPT – into their academic writing processes. The benefits that are stressed throughout the studies, generally encompass grammar, vocabulary, idea generation, immediate and personalized feedback, register, motivation, proofreading, and editing. A common feature of the analysed studies is their reliance on similar research methodologies, which typically include interviews, questionnaires, analyses of students’ writing assignments, and pre- and post-tests. Moreover, most of these studies capture students’ perceptions over a short period and do not engage in longitudinal research that would track the evolution of students’ experiences and attitudes toward the use of AI tools in academic writing contexts. While the primary focus of the reviewed studies is on the benefits related to the content and structure of student writing, many also address notable drawbacks such as the potential for over-reliance on AI, the production of vague or irrelevant responses, and the inability of AI to replicate the nuances of human thought. Nonetheless, the consensus across the studies is that the benefits outweigh the risks, and that the topic warrants further scholarly attention. 4 Conclusion On the basis of the discussion above, it can be inferred that researchers have paid considerable attention to the application of ChatGPT in academic writing, despite the relative novelty of this AI tool. Given the complexity and high relevance of writing as one of the main language skills, this focus is unsurprising. The review of recent literature reveals that ChatGPT indeed holds significant promise as a tool for enhancing academic writing, particularly in the context of English language learning. Studies disclose that, when used effectively and ethically (with proper student training), ChatGPT has many benefits. It can support students in various stages of the writing process, from idea generation to revision, providing guidance on content, structure, grammar, and vocabulary, all while improving motivation. The advantages to using it include its role in facilitating brainstorming, improving writing mechanics, and providing corrective feedback. These advantages apply to both non-native and native speakers of English. However, the integration of ChatGPT into academic writing is not without risks. Recent studies highlight that overreliance on the tool may hinder the development of students’ 65ACADEMIC WRITING critical thinking, creativity, and self-editing skills. Additionally, there is a potential for academic dishonesty, as students might use it as a shortcut to complete writing assignments or to bypass the writing process entirely. The tool’s limitations in the form of occasional inaccuracies and “hallucinations” emphasize the need for students to exercise caution and verify the information generated by ChatGPT. Regarding students’ perspectives, latest studies show that, in general, English language students from a range of academic backgrounds, embrace this tool in their language acquisition process. They report a positive impact on their writing proficiency, particularly in the planning, drafting and revision phases. It is of paramount importance to mention that students also display acute awareness of the downsides of using ChatGPT. In that context, they particularly underline its lack of nuanced, human-like language, occasional stylistic inconsistencies, shortcomings in the use of informal and neutral register, and difficulties logging in and signing in. Ultimately, the findings and insights gained from these studies show that while ChatGPT offers substantial support, it should be viewed as a supplemental tool, not as a replacement for the students’ own effort and intellectual engagement. Universities and language instructors must guide students in using AI tools responsibly, ensuring that these complement rather than replace student learning and development in academic writing. Thus, for instance, in the pre-writing phase, students should be encouraged to do the brainstorming independently first, and then ask AI tools to generate ideas for them. Also, in the writing and revision phase, students should be instructed to be persistent in verifying the truthfulness and reliability of AI-generated content. A major recommendation for future studies is to include longitudinal research that examines potential changes in students’ experiences with and attitudes toward the use of ChatGPT in writing contexts. Additionally, future research could address unresolved questions, such as how educators can train students to use ChatGPT ethically and whether universities should implement specific regulations to address the ethical challenges associated with using AI in writing assignments. References Artiana, Nisa, and Ria Fakhrurriana. 2024. “EFL undergraduate students’ perspective on using AI-based ChatGPT in academic writing.” Language and Education Journal 9 (1): 1–11. Barrot, Jessie. 2023. “Using ChatGPT for second language writing: Pitfalls and potentials.” Assessment in Writing 57 (2): 100745. https://doi.org/10.1016/j.asw.2023.100745. Briggs, Nell. 2018. “Neural machine translation tools in the language learning classroom: Students’ use, perceptions, and analyses.” The JALT CALL Journal 14 (1): 2–24. https://doi.org/10.29140/jaltcall.v1 4n1.221. Chan, Cecilia Ka Yuk, and Wenjie Hu. 2023. “Students’ voices on generative AI: Perceptions, benefits, and challenges in higher education.” International Journal of Educational Technology in Higher Education 20 (43): 1–18. https://doi.org/10.1186/s41239-023-00411-8. Diamond, Stephanie, and Jeffrey Allan. 2024. Writing AI Prompts for Dummies. John Wiley & Sons. Dobrin, Sidney. 2023. AI and Writing. Broadview Press. Farina, Mirko, and Andrea Lavazza. 2023. “ChatGPT in society: Emerging issues.” Frontiers in Artificial Intelligence 6: 1130913. https://doi.org/10.3389/frai.2023.1130913. 66 Silvana Neshkovska The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research Ferris, R. Dana. 2018. “Writing in a second language.” In Teaching English to Second Language Learners in Academic Context: Reading, Writing, Listening, and Speaking, edited by Jonathan M. Newton, Dana R. Ferris, Christine C. M. Goh, William Grabe, Frederika L. Stoller, and Larry Vandergriff, 75–122. Routledge. https://doi.org/10.4324/9781315626949-7. Fitria, Tira Nur. 2023. “Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay.” ELT Forum: Journal of English Language Teaching 12 (1): 44–58. Fryer, Luke, and Rollo Carpenter. 2006. “Bots as language learning tools.” Language Learning & Technology 10 (3): 8–14. Hazemali, David, Janez Osojnik, Tomaž Onič, Tadej Todorović, and Mladen Borovič. 2024. “Evaluating chatbot assistance in historical document analysis.” Moderna arhivistika 7 (2): 53–83. https://doi.org /10.54356/ma/2024/biub3010. Hellstrom, Thomas. 2024. “AI and its consequences for the written word.” Frontiers in Artificial Intelligence 6: 1326166. https://doi.org/10.3389/frai.2023.1326166. Hu, Krystal. 2023. ChatGPT Sets Record for Fastest-Growing User Base – Analyst Note. Reuters. https://www .reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/. Huang, Jingshan, and Ming Tan. 2023. “The role of ChatGPT in scientific communication: Writing better scientific review articles.” American Journal of Cancer Research 13 (4): 1148–54. Hyland, Ken. 2003. Second Language Writing. Cambridge University Press. https://doi.org/10.1017/CBO 9780511667251. Imran, Muhammad, and Norah Almusharraf. 2023. “Analyzing the role of ChatGPT as a writing assistant at higher education level: A systematic review of the literature.” Contemporary Educational Technology 15 (4): ep464. https://doi.org/10.30935/cedtech/13605. Jazbec, Saša. 2024. “Umetna inteligenca oziroma orodja, podprta z umetno inteligenco, pri pouku in za pouk tujih jezikov: empirična raziskava o stališčih učiteljev tujega jezika v Sloveniji.” Ars & Humanitas 18 (1): 115–30. https://doi.org/10.4312/ars.18.1.115-130. Jen, Ling Shirley, and Abdul Rahim Salam. 2024. “Using artificial intelligence for essay writing.” Arab World English Journal (AWEJ) (April): 90–99. https://doi.org/10.24093/awej/ChatGPT.5. Karataş, Fatih, Abedi Yaşar Faramarz, Filiz Ozek Gunyel, Derya Karadeniz, and Yasemin Kuzgun. 2024. “Incorporating AI in foreign language education: An investigation into ChatGPT’s effect on foreign language learners.” Education and Information Technologies 29 (15): 19343–66. https://doi.org/10.10 07/s10639-024-12574-6. Kasneci, Enkelejda, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, Stephan Krusche, Gitta Kutyniok, Tilman Michaeli, Claudia Nerdel, Jürgen Pfeffer, Oleksandra Poquet, Michael Sailer, Albrecht Schmidt, Tina Seidel, Matthias Stadler, Jochen Weller, Jochen Kuhn, and Gjergji Kasneci. 2023. “ChatGPT for good? On opportunities and challenges of large language models for education.” Learning and Individual Differences 103: 102274. https://doi.org/10.1016/j.lindif.2023.102274. Khampusaen, Dararat. 2025. “The impact of ChatGPT on academic writing skills and knowledge: An investigation of its use in argumentative essays.” LEARN Journal: Language Education and Acquisition Research Network 18 (1): 963–88. https://doi.org/10.70730/PGCQ9242. Khatter, Sanaa. 2019. “An analysis of the most common essay writing errors among EFL Saudi female learners.” Arab World English Journal 10 (3): 364–81. https://doi.org/10.24093/awej/vol10no3.26. Kohnke, Lucas, Benjamin Luke Moorhouse, and Di Zou. 2023a. “ChatGPT for language teaching and learning.” RELC Journal 54 (2): 537–50. https://doi.org/10.1177/00336882231162868. Mahapatra, Santosh. 2024. “Impact of ChatGPT on ESL students’ academic writing skills: A mixed methods intervention study.” Smart Learning Environments 11: 9. https://doi.org/10.1186/s40561 -024-00295-9. Malá, Marketa, Gabriela Brůhová, and Katerina Vašků. 2022. “Reporting verbs in L1 and L2 English novice academic writing.” ELOPE: English Language Overseas Perspectives and Enquiries 19 (2): 127– 47. https://doi.org/10.4312/elope.19.2.127-147. 67ACADEMIC WRITING Masoudi, Hatim. 2024. “Effectiveness of ChatGPT in improving English writing proficiency among non- native English speakers.” International Journal of Educational Sciences and Arts 3 (4): 62–84. https:// doi.org/10.59992/IJESA. 2023.v3n4p2. Mun, Chae-young. 2024. “EFL learners’ English writing feedback and their perception of using ChatGPT.” Journal of English Teaching Through Movies and Media 25 (2): 26–39. https://doi.org/10 .16875/stem.2024.25.2.26. Nguyen, Ho Huynh Bao, Ho Huynh Bao Ngoc, and Thai Cong Dan. 2024. “EFL students’ perceptions and practices of using ChatGPT for developing English argumentative essay writing skills.” European Journal of Alternative Education Studies 9 (1): 168–216. https://doi.org/10.46827/ejae.v9i1.5341. Nguyen, Thi Thu Hang. 2023. “EFL teachers’ perspectives toward the Use of ChatGPT in writing Classes: A case study at Van Lang University.” International Journal of Language Instruction 2 (3): 1–47. https:// doi.org/10.54855/ijli.23231. Nguyen, Thi Yen Phuong, Nguyen Ngoc Ti, and Phan Nguyen Khanh Hoa. 2025. “The challenges of applying ChatGPT in the academic writing of postgraduate students in English major at IUH.” International Journal of AI in Language Education 2 (1): 20–37. https://doi.org/10.54855/ijaile.25212. Nugroho, Arif, Nur Hidayanto Pancoro Setyo Putro, and Kastam Syamsi. 2023. “The potentials of ChatGPT for language learning: Unpacking its benefits and limitations.” Register Journal 16 (2): 224– 47. https://doi.org/10.18326/register.v16i2.224-247. Nunan, David. 2003. Practical English Language Teaching. McGraw Hill Education. Özçeli̇k, Punar Nermin, and Yangin Gonca Ekşi. 2024. “Cultivating writing skills: The role of ChatGPT as a learning assistant – a case study.” Smart Learning Environments 11, 10. https://doi.org/10.1186/s40561-024-00296-8. Peachey, Nick. 2023. ChatGPT in the Language Classroom. Peachey Publications. Rahmat, Noor Hanim, Mazlen Arepin, D Rohayu Mohd Yunos, and Sharifah Amani Syed Abdul Rahman. 2017. “Analyzing perceived writing difficulties through the social cognitive theory.” PEOPLE: International Journal of Social Sciences 3 (2): 1447–99. https://doi.org/10.20319/pijss.2017.32.14871499. Rahmi, Regina, Zahria Amalina, Andriansyah Andriansyah, and Adrain Rodgers. 2024. “Does it really help? Exploring the impact of Al-generated writing assistant on the students’ English writing.” Studies in English Language and Education 11 (2): 998–1012. https://doi.org/10.24815/siele.v11i2.35875. Richards, C. Jack, and Willy A. Renandya. 2002. Methodology in Language Teaching: An Anthology of Current Practice. Cambridge University Press. https://doi.org/10.1017/CBO9780511667190. Rudolph, Jurgen, Samson Tan, and Shannon Tan. 2023. “ChatGPT: Bullshit spewer or the end of traditional assessments in higher education?” Journal of Applied Learning & Teaching 6 (1): 342– 63. https://doi.org/10.37074/jalt.2023.6.1.9. Sari, Eka Dyah Puspita Sari, and Mia Fitria Agustina. 2022. “Thematic development in students’ argumentative essay.” IDEAS: Journal on English Language Teaching and Learning, Linguistics and Literature 10 (1): 166–74. Sawangwan, Sirin. “ChatGPT vs teacher roles in developing EFL writing.” International Journal of Computer-Assisted Language Learning and Teaching (IJCALLT) 14 (1): 1–21. https://doi.org/10.4018/IJCALLT.361235. Skrabut, Stan. 2023. 80 Ways to Use ChatGPT in the Classroom. Using AI to Enhance Teaching and Learning. Stan Skrabut. Song, Cuiping, and Yanping Song. 2023. “Enhancing academic writing skills and motivation: Assessing the efficacy of ChatGPT in AI-assisted language learning for EFL students.” Frontiers in Psychology 14: 1–14. https://doi.org/10.3389/fpsyg.2023.1260843. Sumakul, Dian Toar. Y. G., Fuad Abdul Hamied, and Didi Sukyadi. 2021. “Students’ perceptions of the use of AI in a writing class.” Advances in Social Science, Education and Humanities Research 624: 52– 57. https://doi.org/10.2991/assehr.k.220201.009. Tica, Lena, and Ivana Krsmanović. 2024. “Overcoming the writer’s block? Exploring students’ motivation and perspectives on using ChatGPT as a writing assistance tool in ESP.” ELOPE: English Language Overseas Perspectives and Enquiries 21 (1): 129–49. https://doi.org/10.4312/elope.21.1.129-149. 68 Silvana Neshkovska The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research Tran, Hong Ngoc, Le Thi Thuy Ngan, and Tran Vu Bich Uyen. 2025. “AI tools in learning academic writing: Benefits and challenges for MA students in the English language studies at the Industrial University of Ho Chi Minh City.” International Journal of AI in Language Education 2 (1): 74–91. https://doi.org/10.54855/ijaile.25215. Tran, Thi Thu Hien. 2024. “AI Tools in teaching and learning English academic writing skills.” Proceedings of the AsiaCALL International Conference 4, 170–87. https://doi.org/10.54855/paic.23413. van Dis, Eva A. M., Johan Bollen, Willem Zuidema, Robert van Rooij, and Claudi L. Bockting. 2023. “ChatGPT: Five priorities for research.” Nature 614 (7947): 224–26. https:// doi.org/10.1038/d41586-023-00288-7. Wang, Mengqian, and Wenge Guo.  2023. “The potential impact of ChatGPT on education: Using history as a rearview mirror.” ECNU Review of Education 1 (8). https:// doi.org/10.1177/20965311231189826. Yang, Yang. 2024. “An empirical study on the impact of ChatGPT on writing proficiency in Chinese EFL learners.” Curriculum and Teaching Methodology 7 (4). https://doi.org/10.23977/curtm.2024.070425. Zirar, Araz. 2023. “Exploring the impact of language models, such as ChatGPT, on student learning and assessment.” Review of Education 11 (3): e3433. https://doi.org/10.1002/rev3.3433. 69ACADEMIC WRITING 2025, Vol. 22 (1), 69-91(228) journals.uni-lj.si/elope https://doi.org/10.4312/elope.22.1.69-91 UDC: [811.111’243:378]:004.912 Rashmika Lekamge Sabaragamuwa University of Sri Lanka, Sri Lanka Clayton Smith University of Windsor, Canada Impact of Auto-Correction Features in Text-Processing Software on the Academic Writing of ESL Learners ABSTRACT The intrusion of technology into language education is undeniable. However, its impact on English as a Second Language (ESL) learners remains unexplored. This study explores how the text-processing and suggestion features of Microsoft Word affect the English language development of ESL learners. The writing samples show that while beginners make fewer spelling and punctuation errors, prolonged reliance on software weakens long-term language proficiency. This finding is supported by cluster analysis of first-year undergraduates, third- year undergraduates, and postgraduates. Conversely, first-year undergraduates learners excel in structuring paragraphs and writing a variety of sentences, which are the areas untouched by automation offered in the tested software. Semi-structured interviews with research-active academics and postgraduate students further validated these findings, highlighting a critical decline in writing confidence due to over-dependence on emerging technology. The study underscores the hidden costs of convenience, urging a recalibration of technology-integrated language pedagogy. Keywords: automated writing correction, ESL development, technology dependence, writing proficiency decline, text-processing software Vpliv funkcije samodejnega popravljanja v programih za urejanje besedil na akademsko pisanje učencev in učenk angleščine kot drugega tujega jezika IZVLEČEK Vdor tehnologije v učenje jezikov je nesporen, a je njen vpliv na angleščino kot drugi tuji jezik premalo raziskan. Študija raziskuje, kako funkcije samodejnega popravljanja in predlogov v programu Microsoft Word vplivajo na razvoj znanja angleščine pri študentih in študentkah angleščine kot drugega tujega jezika. Pisni vzorci so pokazali, da začetniki naredijo manj napak pri črkovanju in ločilih, a dolgotrajna odvisnost od programske opreme oslabi dolgoročno jezikovno znanje. To potrjuje analiza prvega letnika, tretjega letnika in podiplomskih študentov in študentk. Prvi letnik se je sicer izkazal pri strukturiranju odstavkov in variiranju povedi, ki ju programska oprema ne avtomatizira. Polstrukturirani intervjuji z raziskovalci in raziskovalkami in podiplomskimi študenti in študentkami so ugotovitve potrdili ter izpostavili občuten upad samozavesti pri pisanju zaradi pretirane odvisnosti od sodobne tehnologije. Raziskava izpostavi skriti davek udobja in nujnost ponovnega uravnoteženja pedagoških pristopov pri vključevanju tehnologije v jezikovni pouk. Ključne besede: samodejno popravljanje pri pisanju, razvoj ESL, tehnološka odvisnost, upad pisne zmožnosti, programska oprema za obdelavo besedil 70 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features in Text-Processing Software on the Academic ... 1 Introduction The advancement of social and technological development has introduced auto-correction features integrated into text processors, continuously reshaping the terrain of the language competence of English as a second language (ESL) users. Auto-correction refers to digital spelling and grammar correction tools embedded in word-processing programs, which automatically detect errors and either suggest corrections or directly rectify them (Wood 2014). These features were first implemented in the 1980s as a strategy to boost the demand for computers (Cummings 2023; Kruse and Rapp 2023; Larsson and Teigland 2020; Steyn and Johanson 2011). According to the predictions of that era, spell-checker programs were predicted to become a mandatory feature in future text-processing programs. The service provides users with correct spelling or grammar before they can even recognize their mistakes; later, however, it emerged that these programs negatively influenced the writing and language performance of its users (Baron 2023; Omer Ismael et al. 2022; Rüdian, Dittmeyer, and Pinkwart 2022). Scholars have identified that the auto-correction feature in word-processing programs operates through three primary mechanisms: (a) direct corrections that provide the corrected form of error on the spot, (b) indirect corrections that direct users’ attention to the errors but leave users to select the correct option, and (c) metalinguistic corrections where the program identifies the errors, labels the errors based on their nature and provides a brief explanation, with or without relevant examples (Barrot 2023). However, technology and language experts hold conflicting views regarding the impact of these features on the language competences of English learners beyond the inner circle of Kachru’s (1985) ‘Three Concentric Circles’ model. Accordingly, the inner circle represents countries where English is the primary language (e.g., the UK, the USA), the outer circle includes ESL contexts where English serves as an institutionalized additional language (e.g., India, Nigeria), and the expanding circle comprises EFL (English as a Foreign Language) contexts where English is learned as an international language but lacks official status (e.g., China, Saudi Arabia). The negative effects of auto-correction may be more pronounced in outer and expanding circle countries, where learners often rely on normative standards from inner-circle varieties, potentially influencing their linguistic development in ESL (norm- developing) and EFL (norm-dependent) contexts (A. Al-Mutairi 2019; Hu and Jiang 2011). One critical statement claims that this technology has ‘created a generation of dummies’ (Wood 2014, 12). On the contrary, Weigle argues (2013) that providing corrective feedback is essential as it addresses deficiencies in students’ linguistic repertoire, particularly in advanced writing, which can be corrected easily. Furthermore, it highlights quicker and potentially more effective methods for improving academic writing through integrated auto-correction features in text-processing tools. Studies have validated the advantage of the auto-correction features in text-processing applications, as it enables users to compose relatively error-free text (Neto, Bezerra, and Toselli 2020; Putze et al. 2017). Moreover, it determines which words are most likely to have been Rashmika Lekamge, Clayton Smith Impact of Auto- Correction Features in Text-Processing Software on the Academic ... 71ACADEMIC WRITING intended and then fixes the text accordingly, which enhances accuracy in technical aspects of writing, improving readability and consistency throughout the document, additionally reducing the cognitive load of writing, allowing greater focus on content and ideas (Sanchez et al. 2023). Continuous reliance on these features has intensified challenges for ESL and EFL learners, particularly in spelling, note-taking, and instant essay writing (Omer Ismael et al. 2022; Sanchez et al. 2023). It affects student writing abilities because users often fail to notice their mistakes on account of automatic correction (Kontogiannis 1999). The issue causes users to not fully internalize spelling and grammar rules (Ajaj 2022), leading to weaker writing skills over time. This is detrimental for students from an ESL context since they lack English language input from the broader society/environment (Saud et al. 2023) like the inner circle countries (Hu and Jiang 2011). It is thus vital to explore the impact caused on ESL learners’ language competency by the auto-correction feature in text-processing applications. The present study focuses on the effects of auto-correction technology on the English writing skills of ESL undergraduates. Insights from the on-site test experience and interviews with research-active academics and postgraduate participants will ensure the quality and validity of the findings. The findings of this study will assist in identifying the long-term impact of text processors with auto-correction and serve as an initial step towards potential technological developments that could overcome these drawbacks. The following section investigates the existing literature relevant to our study. 2 Literature Review: Empirical Studies on Auto-Correction While text-processing applications with automatic corrective feedback enhance language learning, their use in ESL contexts has shown harmful effects. Kim (2012) stated that error correction is ineffective and harmful in physically interactive learning environments. Similarly, studies have indicated that it is ineffective to expect language development through automatic correction since the student/ learner is unaware of the error he/she has committed (Truscott 1999). Despite concerns about grammar or spelling, students often fail to recognize their errors and misunderstandings. On the contrary, Rüdian et al. (2022) claim that auto- correction is a promising tool for ESL language users, as it helps to minimize the gap between teacher expectations and learner skills, and over 66% of the errors identified by educators were not detected by the auto-correction software (designed for German language). The study also found that the software frequently flagged correct items as errors, causing confusion and undermining its reliability. These findings highlight the need for auto-correction tools to move beyond basic proofreading toward a more comprehensive approach to language learning (Alharbi 2023). Another study found that automated feedback systems improve writing quality and outcomes, but it unveiled shortcomings in ESL/EFL contexts (Benali 2021). Further, Ferris and Roberts (2001) explored effective methods for providing error correction feedback, but their study highlighted the scarcity of existing literature on the impact of auto-correction on English language learners in ESL contexts. 72 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features in Text-Processing Software on the Academic ... Another significant study examined the impact of automatic spelling correction, focusing on learners’ awareness of how the software functions, corrects spelling mistakes, its educational value for learning English spelling, and learners’ dependence on the tool for checking their spelling (Lin, Liu, and Paas 2017). The study found that male learners were generally more competent and benefited more from the software, with the sample showing an overall positive preference for the auto-correction feature (Rahimi, Gholizadeh, and Shahryari 2019). Conversely, the study of Ali et al., (2022) found that learners who relied heavily on technology for spelling correction faced greater challenges in maintaining spelling accuracy during writing tasks. Supporting this argument, Wood (2014) highlighted, a survey revealed that students who heavily depended on smart devices performed worse in spelling than those who had less interaction with such devices and spell-check features. She further explained that millennials, as an emerging generation, often lack three essential skills: reading, writing, and metalinguistic awareness (Vodopija-Krstanović and Brala Vukanović 2015; Wood 2014). The study by Gayed et al. (2022) explored the impact of writing tools on L2 writing proficiency, highlighting broader implications of auto-correction technology for ESL learners’ language development. While the findings indicated a positive impact on syntactic complexity, results for other measures and production rates were inconclusive. The authors attributed these inconsistencies to several factors, including the participants’ limited experience with the software, low usage rates of the word suggestion feature, and the inherent limitations of machine-based assessment tools. They argued that automated systems might fail to capture nuanced writing features, such as contextual and structural errors, which human evaluators can better identify. These findings highlight important considerations for implementing auto-correction tools in ESL contexts. While such tools may support immediate syntactic improvements, their influence on deeper language development remains uncertain. Outcomes are shaped by factors such as technological familiarity, typing proficiency, and the ability to interpret corrections. The study emphasizes the need for comprehensive training and ongoing tool refinement. Furthermore, predictive text and word suggestions may distract less proficient learners, requiring closer evaluation. Thus, auto-correction technology, though promising, must be carefully aligned with learners’ needs and proficiency levels. Sanchez (2023) studied how auto-correction tools impact students’ writing abilities across various dimensions, including vocabulary, syntax, and writing mechanics. The findings revealed marginal performance in student composition, suggesting limited mastery in key areas of writing. For instance, vocabulary assessment consistently yielded a marginal rating, indicating a dependency on auto-correction tools for word choice. Similar trends were observed in syntax and mechanics, where a sizeable portion of students scored poorly, indicating persistent grammatical and structural errors in their writing despite using these tools. This suggests that while auto-correction provides immediate corrective feedback and improves surface-level errors, it may inhibit learners from developing deeper linguistic skills. Furthermore, those participants often relied on auto-correction tools not just for error correction but also as idea generators and time-savers, reflecting a shift in focus towards ease and efficiency over language mastery. These findings underscore that auto-correction tools boost confidence and efficiency but may hinder authentic language learning and critical thinking when overused. 73ACADEMIC WRITING Research indicates that integrating automated feedback with traditional teacher feedback improves ESL learners’ writing skills. A recent study on Turkish EFL students found that this combined approach significantly enhanced writing self-efficacy compared to traditional methods (Sari and Han 2024). This aligns with prior studies by Grimes and Warschauer (2010) and Sherafati et al. (2020), which revealed that automated writing evaluation (AWE) tools, by providing immediate, personalized feedback, promote self-efficacy and learner engagement. These systems enable students to practice without time or space constraints, thereby enhancing confidence and allowing for iterative revisions. However, mixed results were observed in other areas, such as self-regulated writing strategies and writing anxiety. While the combined feedback model did improve self-regulation, it did not significantly reduce anxiety levels, possibly because of the continued role of teacher evaluation. Despite these variations, the immediate and individualized feedback offered by AWE systems has been shown to improve writing performance and facilitate more efficient error correction, suggesting that this hybrid feedback model could contribute positively to language development in ESL contexts. This supports the broader argument that automated feedback can foster a more student-centred and effective learning environment, enhancing both writing proficiency and psychological factors critical to language acquisition. Despite the existing empirical evidence, the limitations in scope and narrow focus of previous studies underscore the requirement to explore emerging trends that are associated with technological advancements and their impact on educational practices. A significant concern that necessitates this study is the overemphasis on Middle Eastern and Western contexts in the literature (Ali et al. 2022; 2022; Benali 2021; Omer Ismael et al. 2022; Wood 2014), leaving a gap in research that addresses the Asian context, where findings could be applicable across the broader ESL landscape. Moreover, the post-COVID-19 learning environment in developing countries has introduced significant changes in educational practices, policies, and technology, leading to increased reliance on technology in the teaching-learning process (Bećirović, Brdarević-Čeljo, and Delić 2021). This over-reliance on technology has created a detectable phenomenon in millennials (Shadiev and Wang 2022), which needs immediate exploration. Therefore, this study is timely, as it aims to address this research gap and identify potential negative consequences of neglecting these issues, contributing to both academic discourse and socially significant outcomes for future generations. 3 Methodology 3.1 Setting This study examines the impact of auto-correction features of text-processing applications on ESL undergraduates’ English language competences. A pre-designed test was deployed as the research instrument to gather evidence (Ahmed 2024). Furthermore, a series of semi- structured interviews with both undergraduate and postgraduate clusters was conducted to ensure the validity and reliability of the data and to obtain more precise findings (Leung 2015). The study was conducted as a cross-sectional investigation (Maier et al. 2023), in which data were collected at a specific point in time. 74 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features in Text-Processing Software on the Academic ... 3.2 Participants This experimental study involved 197 undergraduates from a state-governed university in Sri Lanka who regularly engage in academic writing (assignment submissions, end-of-semester examinations, field report composition, spot tests, etc.). The postgraduate cluster, used for the semi-structured interview series, included twenty postgraduate students. The undergraduate cluster of the sample comprised Year I Semester I (here onwards, the term YI SI will be used) and Year III Semester II (here onwards, the term YIII SII will be used) undergraduates from the discipline of spatial sciences. The participants were selected to preserve the validity and reliability (Leung 2015) of the data. According to Table 1 below, these undergraduates aged between 19 and 26, are pursuing their Bachelor of Science in Spatial Sciences, representing both genders. Most of the sample had limited and inconsistent exposure to computers and the internet prior to entering university (Lekamge and Rajavarathan 2024). The subject General English is offered as part of the Advanced Level examination. However, the results of General English were not considered for university admission in this degree program, leading to significant variation in language competency among undergraduate students. Table 1. Composition and demography of undergraduates who were tested for the experiment and postgraduates who were tested for semi-structured interviews. Year of Study Cluster 1 - Undergraduates Cluster 2 – Postgraduates Year I Semester I Year III Semester II Postgraduates Age 19–22 22–26 28–40 Gender Male – 71 Female – 28 Male – 66 Female – 32 Male – 13 Female – 07 Mother Tongue Sinhala – 78 Tamil – 21 Sinhala – 82 Tamil – 16 Sinhala – 16 Tamil – 04 The Sri Lankan secondary education system predominantly relies on face-to-face learning and a mother tongue (MT/L1) based teaching-learning process (Lekamge, Jayathilake, and Smith 2024), contributing to a low level of computer literacy and English language proficiency for some students in the initial phase of their university education. The students come from two different first-language (L1) backgrounds, as the majority speak Sinhala (L1), and the others Tamil (L1) (Prasangani 2018). To ensure consistency in instruction, the researcher served as the teacher for both the undergraduate and postgraduate clusters. Informed consent was obtained from all students before they participated, in accordance with research ethics (Wu et al. 2019). 3.3 Development of the Research Instruments and Data Collection Once the preliminary list of errors was identified from existing literature (Hládek, Staš, and Pleva 2020; Nejja and Yousfi 2015), the types of errors were identified and listed. Then, to ensure triangulation for validity and reliability, a pilot study was conducted through an online platform with a randomly selected group of ten undergraduates and five postgraduates. The pilot study was vital to ensure the responsiveness and applicability of the tested items (Reed 75ACADEMIC WRITING et al. 2021; Vivek, Nanthagopan, and Piriyatharshan 2023). Then, the finalized error list was composed, and each minute error type was classified under a sub-category for ease of handling the results. The required terminology and possible error types were included in the tested items. Further, the postgraduate cluster was questioned with reference to the impact caused by long-term use of this software. A. On-site test designed to gauge the real-time impact of auto-correction on writing The first research instrument was the pre-designed test targeting the undergraduates, which consisted of three tasks divided into two subcategories. Task one was designed to assess the hand-written competency of students in academic activities. The tasks aimed to assess student competence in listening to academic content and composing relevant notes, addressing potential errors identified in the pilot study. This included terminology that was subject to auto-correction and technical terms specific to the academic discipline of spatial sciences. The students had to compose a 200-word paragraph within the allocated time for each task. Task two was a computer-based test, which was completed in a notepad application with no auto-correction options. Students were given two parts under the second task and asked to submit the saved notepad answer to the link provided at the end of task two. Task three was designed to be completed in a Microsoft Word document (hereafter, a Word document), with auto-correction features. The task had two parts, requiring students to type their answers in the Word document. Each task carried the same weight as previous tasks and was related to the main academic discipline of spatial science. All three parts of this test were designed to include the required terms that helped assess the spelling and grammar concerns that were tested in the experiment (Catelly 2014; Hair et al. 2024; Ren and Seedhouse 2024). The test design is displayed in Figure 1 below. 3.3 Development of the research instruments and data collection Once the preliminary li t of errors was identified from existing literature (Hládek, Staš, and Pleva 2020; Nejja and Yousfi 2015), the types of errors were identified and listed. Then, to ensure triangulation for validity and reliability, a pilot study was conducted through an online platform with a randomly selected group of ten undergraduates and five postgraduates. The pilot study was vital to ensure the responsiveness and a plicability of the tested items (Reed et al. 2021; Vivek, Nanthagopan, and Piriyatharshan 2023). Then, the finalized error list was composed, and each minute error type was classified under a sub-category for ease of handling the results. The required terminology and possible error types were included in the tested items. Further, the postgraduate cluster was questioned with reference to the impact caused by long-term use of this software. A. On-site test designed to gauge the real-time impact of auto-correction on writing The first research instrument was the pre-designed test targeting the undergraduates, which consisted of three tasks divided into two subcategories. Task one was designed to assess the hand-written competency of students in academic activities. The tasks aimed to assess student competence in listening to academic content and composing relevant notes, addressing potential errors identified in t e pilot study. This included terminology that was subject to auto-correct on and technical terms specific to the academic discipline of spatial sciences. The students had to compose a 200-word paragraph within the allocated time for each task. Task two was a computer-based test, which was completed in a notepad application with no auto-correction options. Students were given t o parts under the second task and asked to submit the saved notepad answer o the link p ovided at the end of task two. Task three was designed to be completed in a Microsoft Word document (hereafter, a Word document), with auto- correction features. The task had two parts, requiring students to ty e their answers in the Word document. Each task carried the same weight as previous tasks and was related to the main academic discipline of spatial science. All three parts of this test were designed to include the required terms that helped assess the spelling and grammar concerns that were tested in the experiment (Catelly 2014; Hair et al. 2024; Ren and Seedhouse 2024). The test design is displayed in Figure 1 below. FIGURE 1. Design of the test. Figure 1. Design of the test. B. Semi-structured interviews targeting the postgraduate cluster A series of semi-structured interview was conducted with twenty postgraduates who were willing to participate in an online session of 10-15 minutes. Qualitative interviews carry higher validity in triangulating the data and obtaining authentic viewpoints and experiences relevant to the study (DeJonckheere and Vaughn 2019; Magaldi and Berler 2020). Thus, the topic areas for the interview were designed with extensive reference to and adaptation 76 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features in Text-Processing Software on the Academic ... of prevailing literature. Accordingly, the interview questions covered attitudes toward auto- correction and its impact on academic writing, language learning, and development. It also addressed the accuracy and common errors of auto-correction, its effect on writing style and tone, the availability and compatibility of platforms, challenges and limitations faced by ESL learners, its impact on confidence in English language usage and writing, and long-term changes in writing practices (DeJonckheere and Vaughn 2019; Ranalli and Yamashita 2022; Sanosi 2022; Wei, Wang, and Dong 2023). 3.4 The Procedure of the Study The first research instrument (on-site test) occupied seven major steps. Initially, a suitable testing approach was designed for the experiment with three tasks to test each aspect of the research. To obtain the expected outcomes, the test was divided into three sections. Each section had its own significance to the main objective of the study: a) The first part involved handwritten tasks to assess students’ on-site writing competence without technology support. This aimed to reveal their independent language proficiency and confidence, isolating language skills from computer literacy and typing speed. b) The second part involved notepad-typed content to assess students’ language errors without auto-correction support, though results could be influenced by typing speed and computer literacy. c) The third part involved a composition typed in Microsoft Word, which offers auto- correction, to identify language issues beyond the software’s corrective capabilities. These three test sections allowed the researcher to identify gaps at each phase, with comparisons revealing how students increasingly rely on technology-enhanced tools without developing independent language competences. For the on-site test, participants were selected from YI SI and YIII SII undergraduates. The interview series targeted postgraduate students with extensive experience using text-processing software for academic, professional, and research purposes, aiming to explore the long-term effects of continuous software exposure and to enhance the validity of on-site test results (Kakarash 2023; Morse et al. 2002; Van der Loo and de Jonge 2020). The majority of YI SI students are not well exposed to computer literacy and related technology (De Silva, Kodikara, and Somarathne 2014; Gamage and Halpin 2007; Lekamge and Rajavarathan 2024). The YI SI group was selected for their limited exposure to auto- correction and text-processing software, while postgraduate students, with the longest exposure, provided stronger validation for the findings. The analysis focused on how continuous software exposure affects writing performances of ESL learners. YIII SII students, with considerable exposure to online learning and text-processing tools, offer an intermediate perspective. Thus, the three clusters ensure data accuracy across varying stages of technological exposure (Van der Loo and de Jonge 2020). The third phase involved task preparation, obtaining approvals, and informing students of test procedures. Postgraduate interviewees were notified two weeks in advance, briefed on 77ACADEMIC WRITING question areas, and invited to participate voluntarily. The fourth and most challenging phase was on-site test administration, conducted by three instructors. Instructions were clearly communicated at the beginning and end of each task, with specific time allocations. Upon completing each task, students submitted responses via a Google Form. Collected scripts were then reviewed, and purposeful sampling was used to select complete submissions from ninety-eight participants (Naderifar, Goli, and Ghaljaie 2017). Researchers used content analysis to examine error patterns in each task of the undergraduate on-site test (Amnuai 2020; Salehi and Bahrami 2018). The interview scripts were first translated into English and then analysed using qualitative thematic analysis (Braun and Clarke 2006; Kiger and Varpio 2020; Naeem et al. 2023; Rosairo 2023; J. Singh and Eisenschenk 2021) to generate codes, which were later transformed into themes of the study. Finally, the data from the on-site test and interviews were manually examined to detect existing phenomena with supporting evidence. The data were qualitatively analysed through error analysis and qualitative thematic analysis. Error types were identified in each test phase, and the findings were compared using descriptive thematic analysis (Baxter 1991; Braun and Clarke 2006; Kiger and Varpio 2020). 4 Discussion and Analysis The discussion and analysis section is organized with two primary concerns: (i) to establish the influence of the automatic spelling correction feature on writing skills of ESL context English language learners and (ii) to assess the impact of identifying misspelt words and providing suggestions and the role of grammatical suggestions, as displayed in Table 2. The primary features offered by the software (MS Word) are as follows: automatic spelling correction feature and grammar correction, as well as suggestions for misspellings, vocabulary, stylistic issues, and other writing errors. Based on the overall performance of the participants, the study identified various persistent error types at each phase of the analysis. These error types were categorized into subcategories and main categories to facilitate data processing and analysis (Figure 2). These categories are as follows: (a) grammar errors – subject-verb agreement, tense consistency, pronoun reference, and sentence structure; (b) spelling errors – grapheme, transposition, substitutions, insertions, omissions, case sensitivity, double letters, and homophones; (c) punctuation errors – misplaced or missing commas, apostrophe errors, and missing periods; (d) vocabulary errors – incorrect word usage, redundancy, clichés, and jargon; (e) stylistic errors – lack of clarity, tone and register issues, and insufficient sentence variety; (f) other errors – repetition, paragraph structure issues, improper citations, and incompleteness. 78 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features in Text-Processing Software on the Academic ... FIGURE 2. Identified error categories among the sample (YI SI and YIII SII undergraduates). Since the focus of this study is the impact of auto-correction features in text-processing applications on the language development of learners in the ESL context, unrelated concerns and error types were excluded from analysis. 4.1 Impact of Automatic Correction on Language Performance The comparison of three tasks (hand-written, notepad-based, and Word document-based) revealed certain limitations in the automatic spelling correction. Specifically, the software does not correct all spelling errors made by users, as it relies on a pre-installed auto-correction word list, which reveals the limited effectiveness of the software (Fitria 2021). While software often corrects mistakes automatically without user awareness, this feature poses drawbacks for long-term users (Hiscox, Leonavičiūtė, and Humby 2014). Over time, their attention to key language aspects, such as spelling and punctuation, may diminish (Fan and Ma 2022). These findings will be further validated by the postgraduate sample. All postgraduate interview participants concurred that prolonged and continuous exposure to text-processing software significantly contributed to spelling and punctuation errors when operating in non-technological environments, as it made them heavily dependent on the technology. These elements are crucial in academic writing, underscoring the profound impact of long-term software use (Brenner et al. 2021; Merzi̇Fonluoğlu and Takkaç Tulgar 2023). Table 2 illustrates compelling evidence from graduates who experienced serious effects from auto-correction features over time. Notably, these participants occupy a unique transitional period, having witnessed the introduction of technology into education. Graduates who pursued their undergraduate studies in the late 20th century and early 21st centuries had substantial engagement with hard copies of books/documents and manual notetaking using pen and paper. This method indirectly fostered their cognitive and language development processes (Baker 1994; Dickinson et al. 2012). In contrast, the evidence presented by the sample highlights a clear distinction between the pre- and post-effects of reliance on text- processing tools. The Sri Lankan primary and secondary education system, which traditionally emphasized physical, hand-written work, contributed positively to students’ spelling and language accuracy, which incorporated metacognitive concerns (Baker 1994). However, as students progressed into tertiary education and gained more exposure to digital technology, the anticipated language improvements stagnated or declined. 0 20 40 60 80 100 Su bj ec t-V er b A gr ee m en t Te ns e co ns is te nc y Pr on ou n re fe re nc e Se nt en ce st ru ct ur e G ra ph em e Tr an sp os iti on Su bs tit ut io ns In se rti on s O m iss io n Ca se se ns iti vi ty D ou bl e le tte rs H om op ho ne s M isp la ce d or m iss in g… A po str op he e rro rs N o pe rio d In co rre ct w or d us ag e Re du nd an cy Cl ic he s a nd Ja rg on La ck o f c la rit y To ne a nd re gi ste r Se nt en ce v ar ie ty Re pe tit io n Pa ra gr ap h str uc tu re Im pr op er c ita tio ns In co m pl et e Grammar error types Spelling error type Punctuation error types Vocabulary error types Stylistic error types Other error types YISI Mean value YIIISII Mean Value Figure 2. Identified error categories among the sample (YI SI and YIII SII undergraduates). Since the focus of this study is the impact of auto-correction f atures in text-processing applica ions n he language development of l arners in the ESL context, unrelated concerns and error types were excluded from analysis. 4.1 Impact of Automatic Correction on Language Performance The comp rison of three tasks (hand-wri t n, notepad-based, nd Word docume t-based) revealed certain limitations in the automatic spelling correction. Specifically, the software does not correct all spelling errors made by users, as it relies on a pre-installed auto-correction word list, which reveals the limited effectiveness of the software (Fitria 2021). While software often corrects mistakes automatically without user awareness, this feature poses drawbacks for long-term users (Hiscox, Leonavičiūtė, and H mby 2014). Over tim , their attention t key language a pe ts, s ch as spelling an pu ctuation, may imi ish (Fan and Ma 2022). These findings will be further validated by the postgraduate sample. All postgraduate intervie participants concurred that prolonged and continuous exposure to text-processing software significantly contributed to spelling and punctuation errors when operating in non-technological environments, as it made them heavily dependent on the technology. These elements are crucial in academic writing, underscoring the profound impact of long- term soft are use (Brenner t al. 2021; Merzi̇Fonluoğlu and Takkaç Tulg r 2023). Table 2 illustrates compelling evidence from gra ua es who experienced s rious ffects from uto- correction features over time. Notably, these participants occupy a unique transitional period, having witnessed the introduction of technology into education. Graduates who pursued their undergraduate studies in the late 20th century and early 21st centuries had substantial engagement with hard copies of books/documents and manual notetaking using pen and paper. This method indirectly fostered their cognitive and language development processes (Baker 1994; Dickinson et al. 2012). In contrast, the evidence presented by the sample highlights a clear distinction between the pre- and post-effects of 79ACADEMIC WRITING reliance on text-processing tools. The Sri Lankan primary and secondary education system, which traditionally emphasized physical, hand-written work, contributed positively to students’ spelling and language accuracy, which incorporated metacognitive concerns (Baker 1994). However, as students progressed into tertiary education and gained more exposure to digital technology, the anticipated language improvements stagnated or declined. A comparative analysis of hand-written content between YI SII students and YIII SII students provided additional insights. The YI SII cohort, with less exposure to text-processing software, had fewer errors in spelling and punctuation. The reason is the impact of the traditional learning mode during their school years, with more focus on pen-and-paper based writing (Vičič 2020), extracting and learning. In contrast, the YIII SII students, who relied extensively on technology over an extended period (three or more years in the university during the COVID-19 pandemic), showed no marked improvement in their hand-written assessments. This pattern suggests that extensive dependence on auto-correction tools diminishes the language awareness of students, leading to persistent errors that hinder academic writing precision (Baker 1994). Table 2. Themes developed from the semi-structured interviews with postgraduate students. Theme Interview responses Limitations and challenges in using auto- corrections in text-processing applications A. sometimes, we do work or research on a specific field, … But when we insert some technical terms, the Word app does not accept those and automatically provides some other words with a different meaning … B. Because scientific terms, technical terms and other subject-specific acronyms are not provided in Word, so we need to add those terms into the dictionary manually and again, these terms get inter-mingled with other terms later, and sometimes it automatically corrects the terms with other words according to its pre-installed rules. C. Word document has no connection with the writing style and the tone of the composition. The Word software does not address these stylistic concerns … Coping mechanisms for overcoming limitations D. Hmm … yes, I think I have lost my confidence in spelling when compared to school days and undergraduate days. During my undergraduate years, I am talking nearly 15 years back, okay? We did everything manually, drafted maps manually, wrote our field reports manually, submitted our assignments manually, composed tutorials, and reports manually, referred to hard copies of books and documents, noted things manually and referred to hard copies of books in the library. From all those activities, I was so confident about my language, which improved drastically once I started my undergraduate degree, and reading the prescribed books in the library and noting and extracting things manually seriously impacted my language development … But once I started doing my master’s and other research and academic works with technology, let’s say after 2010 onwards, I strongly used computers and used softcopies of documents and started copying and pasting, which is easier and more efficient but has drastically impacted my language competency, … I feel like I am stuck and not any development … impact of software must have contributed a dominant proportion to this dependency. E. … Thus, I use the Word software always, but now I use AI tools more often to finetune my language and writing style. I would draft the initial content in a Word document and then use the available AI tools to enhance the quality of the writing style and omit the grammatical errors. 80 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features in Text-Processing Software on the Academic ... Impact of text-processing software on language development among ESL context users F. rather than for language development, it makes it easier and more efficient to handle the situation. Actually, we do not usually focus on language development; … But … we identify that some sort of grammatical suggestions provided in the word app make us aware of our mistakes. For some users, this can be helpful in identifying that they have grammar mistakes. I don’t think it is a successful way to develop language skills. G. it has an impact on language development, but it is very limited, right? It provides some suggestions and corrections in some instances, but that is not sufficient and a good way to proceed. But, if someone is intentionally trying to learn through every minute instance as an opportunity to develop their language, then this can be considered as a sufficient way to develop language skills. But I don’t know whether Word can cover all grammar concerns, especially when it comes to complex, longer sentences. However, for learners at the beginner level, to compose a composition with fewer errors, Word is beneficial. H. I believe that Word automatically corrects major grammar errors like subject-verb agreement, and spelling errors, capitalization stuff and period related concerns, but this automatic correction might harm learners because the user has no idea whether he has written correctly because prior to detection, the software itself corrects the error. However, I admit that the suggestions provided by Word for grammar and spelling concerns have a severe impact on learning language. Because the suggestion or explanation can lead the user towards a better understanding of the language. On the other hand, automatic correction can lead users towards lack of awareness about writing mechanics that are mandatory for academic concerns. I. However, auto-correction …, is not a good thing. Because, I personally must admit that the worst scenario is my experience: I once participated in a spelling competition during my school days, and I was very good at English compared to my colleagues back then. However, now, the changes that have happened to me are actually very bad. Now I am not confident of my spelling capacity, and I always recheck it with an available online tool. Even though working as an academic and a researcher, I am continuously engaging with text processors. But I recognized that whenever I have to write something using pen and paper, I get stuck with spelling, seriously. Figure 4 illustrates the most relevant error types that the automatic correction feature directly addresses, often without user awareness. These error types include spelling and punctuation. The most significant sub-categories are omission errors, case sensitivity issues, double letters in spelling, misplaced or missing commas, apostrophes, and the absence of periods. The postgraduate cluster provides evidence that continued exposure to text processors seriously affects the writing mechanics of users, as is evident in the answers of Interviewees H and I in Table 2. Accordingly, the long-term exposure of this middle-aged cluster, who experienced the transition from the conventional mode of learning to the flipped mode after the pandemic, indicated a deterioration in their spelling confidence due to prolonged entanglement with technology-enhanced writing tools (Reed et al. 2021; Wei, Wang, and Dong 2023). When comparing the three tests, most errors were found in tasks completed using the notepad, possibly owing to typing speed, technical literacy, and related challenges (Van Waes et al. 2021). Comparison of handwritten content with text-processed content showed a significant reduction in most error types, particularly no-period errors, case sensitivity issues, and omissions (Figure 3). This suggests that text-processing software has a positive short-term impact on producing more accurate written work (Van Der Steen, Samuelson, and Thomson 2017). However, its 81ACADEMIC WRITING long-term use negatively affects human autonomy, language competence and user confidence (Bickmore and Picard 2005; Schaefer et al. 2016). Year-based analysis highlights the significant impact of punctuation errors. As shown in Figure 3, punctuation errors were notably reduced in text-processed content compared to handwritten and notepad tasks. However, many third-year undergraduates continued to exhibit punctuation errors in handwritten work, likely because of prolonged reliance on automatic correction, which conceals such mistakes (Van Waes et al. 2021). Consequently, the quality of their hand-written work shows minimal improvement over that of newly enrolled undergraduates. A key factor underlying this trend is the foundational cognitive development that occurs during school years. Activities that integrate traditional methods of writing played a critical role in establishing robust language mechanics (Kellogg 2008). However, the transition to university-level education, particularly during the COVID-19 pandemic, drastically altered the learning environment. Conventional, in-person educational practices in Sri Lanka were abruptly replaced by rigid online modes, causing a significant disruption without sustainable curricular adjustments and required facilities (Lekamge and Rajavarathan 2024). This shift fostered heavy dependence on digital tools, leaving students without essential skills in traditional writing practices and increasingly reliant on technology for language functions. The results suggest that prolonged reliance on auto-correction may delay language learners’ ability to produce error-free compositions, diminishing their innate writing accuracy over time, as reflected in the persistence of minor errors (Omer Ismael et al. 2022). This finding is further supported by the postgraduate cluster, who reported losing confidence in producing error-free content. The long-term impact of heavy technological dependence, particularly the auto-correction feature in text-processing applications, negatively affects user confidence (Bickmore and Picard 2005; Schaefer et al. 2016). accurate written work (Van Der Steen, Samuelson, and T omson 2017). However, its long-term use negatively affects huma autonomy, language compete ce a d user confidence (Bickmore and Picard 2005; Schaefer et al. 2016). Yea -based analysis highlights the significant impact of pu ctuation errors. As shown in Figure 3, punctuation errors were notably reduced in text-processed content compared to handwritten and otepad tasks. However, many third-year undergraduates continued to exhibit punctuation errors in handwritten work, likely because of prolonged reliance on automatic correction, which conceals such mistakes (Van Waes et al. 2021). Consequently, the quality of their hand- written work shows minimal improvement over that of newly enrolled undergraduates. A key factor underlying this trend is the foundational cognitive development that occurs during school years. Activities that integrate traditional methods of writing played a critical role in establishing robust language mechanics (Kellogg 2008). However, the transition to university-level education, particularly during the COVID-19 pandemic, drastically altered the learning environment. Conventional, in-person educational practices in Sri Lanka were abruptly replaced by rigid online modes, causing a significant disruption without sustainable curricular adjustments and required facilities (Lekamge and Rajavarathan 2024). This shift fostered heavy dependence on digital tools, leaving students without essential skills in traditional writing practices nd increasingly reliant on technology for la guag functions. FIGURE 3. Detected errors (by year): How exposure to technology has affected the language performance of undergraduates. The results suggest that prolonged reliance on auto-correction may delay language learners’ ability to produce error-free compositions, diminishing their innate writing accuracy over time, as reflected in the persistence of minor errors (Omer Ismael et al. 2022). This finding is further supported by the postgraduate cluster, who reported losing confidence in producing error-free content. The long-term impact of heavy technological dependence, particularly the auto-correction feature in text-processing applications, negatively affects user confidence (Bickmore and Picard 2005; Schaefer et al. 2016). 54 19 11 24 71 58 12 37 90 53 53 0 20 40 60 80 100 G ra ph em e T ra ns po sit io n Su bs tit ut io ns In se rt io ns O m iss io n C as e se ns iti vi ty D ou bl e le tte rs H om op ho ne s M isp la ce d or m iss in g co m m as Ap os tr op he e rr or s N o pe rio d Spelling errors Punctuation errors YISI Handwritten YISI Notepad YISI Word YIIISII Handwritten YIIISII Notepad YIIISII Word Figure 3. Detected errors (by year): How exposure to technology has affected the language performance of undergraduates. 82 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features in Text-Processing Software on the Academic ... 4.2 Impact of Identifying and Emphasizing Errors to the User The second feature of text-processing applications highlights incorrect words, grammatical issues, and other language concerns, offering suggestions for correction (Kukich 1992; S. Singh and Singh 2018). According to results, this feature covers a broad range of errors: grammar rule-related errors, spelling errors, vocabulary-related errors, and other error types. This feature positively impacts language learners in the ESL context by identifying areas that need improvement and raising awareness of their mistakes (Wood 2014). This capability is useful in highlighting errors in answer scripts, allowing learners to develop their language skills through cognitive engagement (Pinet and Nozari 2022; Rüdian, Dittmeyer, and Pinkwart 2022; Sherafati, Largani, and Amini 2020). When users click on a highlighted error, the software offers accurate suggestions, aiding conscious correction. Users can also add technical terms to the dictionary or auto-correction list, benefiting academic writers by allowing them to focus more on content rather than on language errors (Khansir 2012). However, the postgraduate cluster revealed that the software often fails to recognize technical terms and acronyms, automatically replacing these with similar-looking alternatives (Table 2 – Interviewees A and B). To avoid this, users must manually add scientific terms to the software dictionary (Cook and Jensen 2019; Salton and Lesk 1968). However, manual additions are not feasible for all users on account of their varying levels of comfort with technology, and the fact that continuous dictionary updates are time-consuming. However, the feature serves as a positive catalyst for language learners in ESL contexts, as it integrates cognitive processes with the task (Goonawardena et al. 2022; Rüdian, Dittmeyer, and Pinkwart 2022; Wood 2014). By purposefully correcting highlighted concerns, users experience a significant impact on their cognitive processes, leading to a conscious awareness of errors (Ellis et al. 2008; Pinet and Nozari 2022). Additionally, the provision of grammar suggestions enhances learners’ theoretical understanding of grammar rules (Ellis et al. 2008; Ji and Liu 2018). Consequently, this approach enhances language accuracy by reducing the potential harm caused by the software to language users by its suggestion provider feature. For further clarification, twenty answer scripts were randomly chosen for review based on mistakes highlighted by the Word software, as displayed in Figure 5. Regardless of study phase, most errors were in spelling (above 300 occurrences) and punctuation (above 250 occurrences). Thus, a significant finding is that the software fails to identify stylistic aspects and some vocabulary errors (Putze et al. 2017; Rüdian, Dittmeyer, and Pinkwart 2022; Shadiev and Wang 2022). MS Word does not address writing style, paragraph structure, coherence, or cohesion. Long-term users of the software report that they incorporate larger- scale AI tools to enhance their writing style, coherence, and cohesion (Tica and Krsmanović 2024). This clearly prompts software developers to enhance the features in text-processing software (Alharbi 2023; Benali 2021; Gayed et al. 2022; Jajić Novogradec 2021; Ranalli and Yamashita 2022; Salton and Lesk 1968). Compared to the Word document-based task, YIII SII students showed a reduction in error occurrences, with the greatest decrease in other error types such as repetition and incomplete terms. Punctuation errors (apostrophe and comma issues) followed in reduction, 83ACADEMIC WRITING while grammar errors, including tense and subject-verb agreement, showed the third most significant improvement (Fan and Ma 2022; Ferris and Roberts 2001). However, spelling errors show the lowest difference across each year of study. That hinders the proficiency of students in technical and subject-related terminology. These terminology-related spelling error concerns are common in their hand-written content and are displayed in Figure 5. As shown in Figure 5, most errors were found in the YI SI cluster. However, YIII SII students committed notably more punctuation errors in the handwritten task, with a count of 221, while other error types were lower compared to the YI SI cluster. Apart from that, stylistic errors were not detected by the text-processing software (Gröndahl and Asokan 2020; Stamatatos, Fakotakis, and Kokkinakis 2000), resulting in no recorded values for these errors in Figure 4. However, the researcher identified numerous stylistic errors in the hand-written content (Figure 5), particularly in the subcategories of clarity, tone, register, and sentence variety. Incorrect word usage was a recurring error in the sample. However, the software failed to detect it, because the pre-installed logic lacks the capacity to interpret semantic accuracy or assess the higher-level cognitive attributes involved in human language processing (Rüdian, Dittmeyer, and Pinkwart 2022; Salton and Lesk 1968; Wood 2014). Hence, this limitation highlights a significant gap in current text-processing applications. However, the most recent developments in artificial intelligence (AI) have begun to address this concern, as evidenced by the data obtained from the postgraduate cluster (Table 2). Another noteworthy observation is that the software did not identify issues related to paragraph structure and organization. These issues were highly apparent in the YI SI cluster but were significantly less prominent in the YIII SII cluster. This insight further supports the fact that, with gradual exposure to the language through CLIL (Content and Language Integrated Learning), students make noticeable progress in their language development. This valuable insight emerged as a by- product of the study and is worthy of further exploration. FIGURE 4. Number of occurrences of error types as highlighted on the task sheet (completed using text-processing software). As shown in Figure 6, most errors were found in the YI SI cluster. However, YIII SII students committed notably more punctuation errors in the handwritten task, with a count of 221, while other error types were lower compared to the YI SI cluster. 0 50 100 150 200 250 300 350 Grammar error types Spelling error type Punctuation error types Vocabulary error types Stylistic error types Other error types Number of occurrences YISI Number of occurrences YIIISII Figure 4. Number f occurrences f error types as ighlight d on t e task sheet (compl ted using text-processing software). 84 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features in Text-Processing Software on the Academic ... Another significant finding is that students in the YI SI cluster demonstrate stronger language skills in their hand-written work (spelling basic terminology and punctuation use). As these students gradually gain exposure to subject-specific terminology and technical terminology, the frequency of errors decreases, as observed in the YIII SII cluster. In contrast, students in the YIII SII cluster exhibit a higher frequency of punctuation errors, likely due to continued reliance on text-processing software. The most noticeable effects of this reliance on text- processing software become apparent during hand-written assessments, where automated support is unavailable. These findings are also reinforced by triangulated data from the postgraduate cluster, emphasizing the pervasive impact on writing mechanisms. The data illustrate that this issue is not isolated but a widespread concern that affects many users. Prolonged dependence on auto-correction features has resulted in a gradual decline in essential language competences, signalling an urgent need for intervention to preserve and enhance human cognitive abilities in language functions (Brenner et al. 2021; Huseinović 2022; Merzi̇Fonluoğlu and Takkaç Tulgar 2023). Another notable finding identified in the study is the recurrent use of abbreviated forms and informal texting language in the hand-written content (Booton, Hodgkiss, and Murphy 2023; Dwivedi et al. 2023; Genlott and Grönlund 2013; Jonsson and Blåsjö 2020) in both clusters. This phenomenon adversely affects their academic writing, often introducing an inappropriately informal tone into formal writing samples. The evidence indicates that text-processing applications exert both positive and negative influences on the academic language development of ESL learners (Alharbi 2021; Mahapatra 2024). Thus, the current study highlights the importance of language instructors and educational practitioners strategically leveraging the beneficial aspects of these tools to enhance the linguistic competence of learners. Simultaneously, it suggests that software developers should incorporate features that actively engage cognitive processes, promoting more effective language development rather than focusing on the efficiency of text production (Booton, Hodgkiss, and Murphy 2023; Jia et al. 2019; Khan et al. 2023). Apart from that, stylistic errors were not detected by the text-processing software (Gröndahl and Asokan 2020; Stamatatos, Fakotakis, and Kokkinakis 2000), resulting in no recorded values for these errors in Figure 4. However, the researcher identified numerous stylistic errors in the hand-written content (Figure 5), particularly in the subcategories of clarity, tone, register, and sentence variety. Incorrect word usage was a recurring error in the sample. However, the software failed to detect it, because the pre- installed logic lacks the capacity to interpret semantic accuracy or assess the higher-level cognitive attributes involved in human language processing (Rüdian, Dittmeyer, and Pinkwart 2022; Salton and Lesk 1968; Wood 2014). Hence, this limitation highlights a significant gap in current text-processing applications. However, the most recent developments in artificial intelligence (AI) have begun to address this concern, as evidenced by the data obtained from the postgraduate cluster (Table 2). Another noteworthy observation is that the software did not identify issues related to paragraph structure and organization. These issues were highly apparent in the YI SI cluster but were significantly less prominent in the YIII SII cluster. This insight further supports the fact that, with gradual exposure to the language through CLIL (Content and Language Integrated Learning), students make noticeable progress in their language development. This valuable insight emerged as a by-product of the study and is worthy of further exploration. FIGURE 5. Year-wise comparison of error occurrences: A comparison between the hand-written content and the text- processing application-based content Another significant finding is that students in the YI SI cluster demonstrate stronger language skills in their hand-written work (spelling basic terminology and punctuation use). As these students gradually gain exposure to subject-specific terminology and technical terminology, the frequency of errors decreases, as observed in the YIII SII cluster. In contrast, students in the YIII SII cluster exhibit a higher frequency of unctuation errors, likel due to continued reliance o text- processing software. The most noticeable effects of this relia ce on text-processing software become apparent during hand-written assessments, where automated support is unavailable. These findings are also reinforced by triangulated data from the postgraduate cluster, emphasizing the pervasive impact on writing mechanisms. The data illustrate that this issue is not isolated but a widespread concern that affects many users. Prolonged dependence on auto-correction features has resulted in a gradual decline in essential language competences, signalling an urgent need for intervention to 111 321 287 21 0 7487 317 254 12 0 32 212 198 211 37 98 27 143 143 221 13 39 9 0 50 100 150 200 250 300 350 Grammar error types Spelling error type Punctuation error types Vocabulary error types Stylistic error types Other error types Number of Occurrences (Word document) YISI Number of Occurrences (Word document) YIIISII Number of Occurrences (Handwritten document) YISI Number of Occurrences (Handwritten document) YIIISII igure 5. Year-wise comparison f e or occurrence : A comparison betw en the hand-written c nte t and the text-processing application-based content. 85ACADEMIC WRITING 5 Conclusion The study reveals that text-processing software (MS Word) with features such as auto- correction and error suggestions can provide valuable support in reducing certain language errors and increasing user focus on quality content. However, it also poses long-term challenges to language learners in the ESL context by making the users more dependent on the software and eventually diminishing writing mechanics like punctuation and spelling. Since the ESL context has no language input from the outer society, encounters and interaction with English language content are minimal. In such a context, modern technology offers an indirect path to obtaining English language content. Thus, potential software developments should address avenues to enhance language development. However, prolonged reliance on these tools can diminish student awareness of spelling and punctuation errors and delay the development of independent writing skills. Although the software enhances grammar and vocabulary accuracy, it fails to address stylistic and structural matters, which are crucial for academic writing. Moreover, informal language habits reinforced by frequent technology use can negatively impact academic writing quality. These findings suggest the need for balanced integration of technology in language education, emphasizing the importance of developing both technical accuracy and stylistic proficiency in ESL learners. References A. Al-Mutairi, Mohammad. 2019. “Kachru’s three concentric circles model of English language: An overview of criticism & the place of Kuwait in it.” English Language Teaching 13 (1): 85. https://doi .org/10.5539/elt.v13n1p85. Ahmed, Sirwan Khalid. 2024. “The pillars of trustworthiness in qualitative research.” Journal of Medicine, Surgery, and Public Health 2: 100051. https://doi.org/10.1016/j.glmedi.2024.100051. Ajaj, Israa Eibead. 2022. “Investigating the difficulties of learning English grammar and suggested methods to overcome them.” Journal of Tikrit University for Humanities 29 (6): 45–58. https://doi.org/10.251 30/jtuh.29.6.2022.24. Alharbi, Sultan H. 2021. “The struggling English language learners: Case studies of English language learning difficulties in EFL context.” English Language Teaching 14 (11): 108. https://doi.org/10.55 39/elt.v14n11p108. Alharbi, Wael. 2023. “AI in the foreign language classroom: A pedagogical overview of automated writing assistance tools.” Education Research International 2023: 1–15. https://doi.org/10.1155/2023/425 3331. Ali, Hewa Fouad, Lisa Jamal Nakshbandi, Fatima Saadi, and Sami Hussein Hakeem Barzani. 2022. “The effect of spell-checker features on spelling competence among EFL Learners: An empirical study.” International Journal of Social Sciences & Educational Studies 9 (3): 101–11. https://doi.org/10.239 18/ijsses.v9i3p101. Amnuai, Wirada. 2020. “An error analysis of research project abstracts written by Thai undergraduate students.” Advances in Language and Literary Studies 11 (4): 13. Baker, Linda. 1994. “Fostering metacognitive development.” Advances in Child Development and Behavior 25: 201–39. https://doi.org/10.1016/S0065-2407(08)60053-1. Balla, Ervin. 2023. “Impact of technology in acquisition of English language.” Journal of Educational and Social Research 13 (1): 134–45. https://doi.org/10.36941/jesr-2023-0012. Baron, Dennis E. 2023. A Better Pencil: Readers, Writers, and the Digital Revolution. 1st ed. Oxford University Press. Barrot, Jessie S. 2023. “Using automated written corrective feedback in the writing classrooms: Effects on L2 writing accuracy.” Computer Assisted Language Learning 36 (4): 584–607. https://doi.org/10.10 80/09588221.2021.1936071. 86 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features in Text-Processing Software on the Academic ... Baxter, L.A. 1991. Content Analysis. The Guilford Press. Bećirović, Senad, Amna Brdarević-Čeljo, and Haris Delić. 2021. “The use of digital technology in foreign language learning.” SN Social Sciences 1 (10): 246. https://doi.org/10.1007/s43545-021-00254-y. Benali, Ameni. 2021. “The impact of using automated writing feedback in ESL/EFL classroom contexts.” English Language Teaching 14 (12): 189–95. https://doi.org/10.5539/elt.v14n12p189. Bickmore, Timothy W., and Rosalind W. Picard. 2005. “Establishing and maintaining long-term human- computer relationships.” ACM Transactions on Computer-Human Interaction 12 (2): 293–327. https:// doi.org/10.1145/1067860.1067867. Booton, Sophie A., Alex Hodgkiss, and Victoria A. Murphy. 2023. “The impact of mobile application features on children’s language and literacy learning: A systematic review.” Computer Assisted Language Learning 36 (3): 400–429. https://doi.org/10.1080/09588221.2021.1930057. Bozavli, Ebubekir. 2023. “The relationship between the use of technology and technology addiction in learning foreign language.” Arab World English Journal 14 (3): 418–30. https://doi.org/10.24093 /awej/vol14no3.27. Braun, Virginia, and Victoria Clarke. 2006. “Using thematic analysis in psychology.” Qualitative Research in Psychology 3 (2): 77–101. https://doi.org/10.1191/1478088706qp063oa. Brenner, Maria, Denise Alexander, Mary Brigid Quirke, Jessica Eustace-Cook, Piet Leroy, Jay Berry, Martina Healy, Carmel Doyle, and Kate Masterson. 2021. “A systematic concept analysis of ‘technology dependent’: Challenging the terminology.” European Journal of Pediatrics 180 (1): 1–12. https://doi.org/10.1007/s00431-020-03737-x. Catelly, Yolanda-Mirela. 2014. “Optimizing language assessment – Focus on test specification and piloting.” Procedia – Social and Behavioral Sciences 128 (April): 393–98. https://doi.org/10.1016/j.sb spro.2014.03.177. Cook, Helen V., and Lars Juhl Jensen. 2019. “A guide to dictionary-based text mining.” In Bioinformatics and Drug Discovery, 3rd ed., edited by Richard S. Larson and Tudor I. Oprea, 73–89. Springer. https://doi.org/10.1007/978-1-4939-9089-4_5. Cummings, Lance. 2023. “Writing processes in the digital age: A networked interpretation.” In Digital Writing Technologies in Higher Education, edited by Otto Kruse, Christian Rapp, Chris M. Anson, Kalliopi Benetos, Elena Cotos, Ann Devitt, and Antonette Shibani, 485–97. Springer International Publishing. https://doi.org/10.1007/978-3-031-36033-6_30. De Silva, W. Indralal, Pamoda Kodikara, and Ruwani Somarathne. 2014. “Sri Lankan youth and their exposure to computer literacy.” Sri Lanka Journal of Advanced Social Studies 3 (1): 27–52. https://doi .org/10.4038/sljass.v3i1.7127. DeJonckheere, Melissa, and Lisa M. Vaughn. 2019. “Semistructured interviewing in primary care research: A balance of relationship and rigour.” Family Medicine and Community Health 7 (2): e000057. https:// doi.org/10.1136/fmch-2018-000057. Dickinson, David K., Julie A. Griffith, Roberta Michnick Golinkoff, and Kathy Hirsh-Pasek. 2012. “How reading books fosters language development around the world.” Child Development Research 2012: 602807. https://doi.org/10.1155/2012/602807. Dwivedi, Yogesh K., Nir Kshetri, Laurie Hughes, Emma Louise Slade, Anand Jeyaraj, Arpan Kumar Kar, Abdullah M. Baabdullah, et al. 2023. “Opinion paper: ‘So what if ChatGPT wrote it?’ Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy.” International Journal of Information Management 71:102642. https://doi.org/10.1016/j.ijinfomgt.2023.102642. Ellis, Rod, Younghee Sheen, Mihoko Murakami, and Hide Takashima. 2008. “The effects of focused and unfocused written corrective feedback in an English as a foreign language context.” System 36 (3): 353–71. https://doi.org/10.1016/j.system.2008.02.001. Fan, Ning, and Yingying Ma. 2022. “The effects of automated writing evaluation (AWE) feedback on students’ English writing quality: A systematic literature review.” Language Teaching Research Quarterly 28: 53–73. https://doi.org/10.32038/ltrq.2022.28.03. 87ACADEMIC WRITING Ferris, Dana, and Barrie Roberts. 2001. “Error feedback in L2 writing classes.” Journal of Second Language Writing 10 (3): 161–84. https://doi.org/10.1016/S1060-3743(01)00039-X. Fitria, Tira Nur. 2021. “Grammarly as AI-powered English writing assistant: Students’ alternative for writing English.” Metathesis: Journal of English Language, Literature, and Teaching 5 (1): 65–78. Gamage, Premila, and Edward F. Halpin. 2007. “E‐Sri Lanka: Bridging the digital divide.” The Electronic Library 25 (6): 693–710. https://doi.org/10.1108/02640470710837128. Gayed, John Maurice, May Kristine Jonson Carlon, Angelu Mari Oriola, and Jeffrey S. Cross. 2022. “Exploring an AI-based writing assistant’s impact on English language learners.” Computers and Education: Artificial Intelligence 3:100055. https://doi.org/10.1016/j.caeai.2022.100055. Genlott, Annika Agélii, and Åke Grönlund. 2013. “Improving literacy skills through learning reading by writing: The iWTR method presented and tested.” Computers & Education 67 (September): 98–104. https://doi.org/10.1016/j.compedu.2013.03.007. Goonawardena, Mithma, Ashini Kulatunga, Raveena Wickramasinghe, Thisuraka Weerasekara, Hansi De Silva, and Samantha Thelijjagoda. 2022. “Automated spelling checker and grammatical error detection and correction model for Sinhala language.” In 2022 International Research Conference on Smart Computing and Systems Engineering (SCSE), Colombo, Sri Lanka, 184–89. IEEE. https://doi .org/10.1109/SCSE56529.2022.9905126. Grimes, Douglas, and Mark Warschauer. 2010. “Utility in a fallible tool: A multi-site case study of automated writing evaluation.” The Journal of Technology, Learning and Assessment 8 (6): 4–42. https:// ejournals.bc.edu/index.php/jtla/article/view/1625. Gröndahl, Tommi, and N. Asokan. 2020. “Text analysis in adversarial settings: Does deception leave a stylistic trace?” ACM Computing Surveys 52 (3): 1–36. https://doi.org/10.1145/3310331. Hair, Joseph F., Pratyush N. Sharma, Marko Sarstedt, Christian M. Ringle, and Benjamin D. Liengaard. 2024. “The shortcomings of equal weights estimation and the composite equivalence index in PLS- SEM.” European Journal of Marketing 58 (13): 30–55. https://doi.org/10.1108/EJM-04-2023-0307. Hiscox, Lucy, Erika Leonavičiūtė, and Trevor Humby. 2014. “The effects of automatic spelling correction software on understanding and comprehension in compensated dyslexia: Improved recall following dictation.” Dyslexia 20 (3): 208–24. https://doi.org/10.1002/dys.1480. Hládek, Daniel, Ján Staš, and Matúš Pleva. 2020. “Survey of automatic spelling correction.” Electronics 9 (10): 1670. https://doi.org/10.3390/electronics9101670. Hu, Betsy Xiaoqiong, and Xianxing Jiang. 2011. “Kachru’s three concentric circles and English teaching fallacies in EFL and ESL contexts.” Changing English 18 (2): 219–28. https://doi.org/10.1080/1358 684X.2011.575254. Huseinović, Lamija. 2022. “The relationship between digital competency, learning styles and learners’ perception of traditional versus technology-assisted language learning.” MAP Education and Humanities 3 (1): 17–30. https://doi.org/10.53880/2744-2373.2022.2.3.17. Jajić Novogradec, Marina. 2021. “Positive and negative lexical transfer in English vocabulary acquisition.” ELOPE: English Language Overseas Perspectives and Enquiries 18 (2): 139–65. https://doi.org/10.43 12/elope.18.2.139-165. Ji, Chunyi, and Qi’ang Liu. 2018. “A study on the effectiveness of English grammar teaching and learning in Chinese junior middle schools.” Theory and Practice in Language Studies 8 (11): 1553–58. https:// doi.org/10.17507/tpls.0811.24. Jia, Jingdong, Xiaoying Yang, Rong Zhang, and Xi Liu. 2019. “Understanding software developers’ cognition in agile requirements engineering.” Science of Computer Programming 178: 1–19. https:// doi.org/10.1016/j.scico.2019.03.005. Jonsson, Carla, and Mona Blåsjö. 2020. “Translanguaging and multimodality in workplace texts and writing.” International Journal of Multilingualism 17 (3): 361–81. https://doi.org/10.1080/14790718 .2020.1766051. Kakarash, Zana Azeez. 2023. “Why is data validation important in research?” ResearchGate. https://doi .org/10.13140/RG.2.2.34496.81920. Kellogg, Ronald T. 2008. “Training writing skills: A cognitive developmental perspective.” Journal of Writing Research 1 (1): 1–26. https://doi.org/10.17239/jowr-2008.01.01.1. 88 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features in Text-Processing Software on the Academic ... Khan, Aashiq, Irum Zeb, Yan Zhang, and Tahir. 2023. “Impact of emerging technologies on cognitive development: The mediating role of digital social support among higher education students.” IJERI: International Journal of Educational Research and Innovation 20: 1–15. https://doi.org/10.46661 /ijeri.8362. Khansir, Ali Akbar. 2012. “Error analysis and second language acquisition.” Theory and Practice in Language Studies 2 (5): 1027–32. https://doi.org/10.4304/tpls.2.5.1027-1032. Kiger, Michelle E., and Lara Varpio. 2020. “Thematic analysis of qualitative data: AMEE guide no. 131.” Medical Teacher 42 (8): 846–54. https://doi.org/10.1080/0142159X.2020.1755030. Kim, Hye-Kyung. 2012. “The effectiveness of correcting grammatical errors in writing classes: An EFL teacher’s perspective.” International Journal of Literacy, Culture, and Language Education 1: 227–37. https://doi.org/10.14434/ijlcle.v1i0.26836. Kontogiannis, Tom. 1999. “User strategies in recovering from errors in man–machine systems.” Safety Science 32 (1): 49–68. https://doi.org/10.1016/S0925-7535(99)00010-7. Kruse, Otto, and Christian Rapp. 2023. “Word processing software: The rise of MS Word.” In Digital Writing Technologies in Higher Education, edited by Otto Kruse, Christian Rapp, Chris M. Anson, Kalliopi Benetos, Elena Cotos, Ann Devitt, and Antonette Shibani, 15–32. Springer International Publishing. https://doi.org/10.1007/978-3-031-36033-6_2. Kukich, Karen. 1992. “Techniques for automatically correcting words in text.” ACM Computing Surveys 24 (4): 377–439. https://doi.org/10.1145/146370.146380. Larsson, Anthony, and Robin Teigland, eds. 2020. The Digital Transformation of Labor: Automation, the Gig Economy and Welfare. Routledge. Lekamge, Rashmika, Chitra Jayathilake, and Clayton Smith. 2024. “Language-related barriers and insights to overcome the challenges of English medium instructed learning environment for undergraduates.” International Journal of Current Education Studies 3 (1): 28–53. https://doi.org/ 10.5281/ zenodo.12193460. Lekamge, Rashmika, and Jenan Rajavarathan. 2024. “Enhancing academic writing proficiency among English as a second language users at the undergraduate level: A comparative analysis of student- lecturer perspectives and strategies.” Journal of Research and Education 10 (1): 37–76. Leung, Lawrence. 2015. “Validity, reliability, and generalizability in qualitative research.” Journal of Family Medicine and Primary Care 4 (3): 324. https://doi.org/10.4103/2249-4863.161306. Lin, Po-Han, Tzu-Chien Liu, and Fred Paas. 2017. “Effects of spell checkers on English as a second language students’ incidental spelling learning: A cognitive load perspective.” Reading and Writing 30 (7): 1501–25. https://doi.org/10.1007/s11145-017-9734-4. Magaldi, Danielle, and Matthew Berler. 2020. “Semi-structured interviews.” In Encyclopedia of Personality and Individual Differences, edited by Virgil Zeigler-Hill and Todd K. Shackelford, 4825–30. Springer International Publishing. https://doi.org/10.1007/978-3-319-24612-3_857. Mahapatra, Santosh. 2024. “Impact of ChatGPT on ESL students’ academic writing skills: A mixed methods intervention study.” Smart Learning Environments 11 (1): 9. https://doi.org/10.1186/s40561- 024-00295-9. Maier, Christian, Jason Bennett Thatcher, Varun Grover, and Yogesh K. Dwivedi. 2023. “Cross-sectional research: A critical perspective, use cases, and recommendations for IS research.” International Journal of Information Management 70:102625. https://doi.org/10.1016/j.ijinfomgt.2023.102625. Merzi̇fonluoğlu, Ayşe, and Ayşegül Takkaç Tulgar. 2023. “The effect of technology-supported language learning on communication competencies.” Erzincan Üniversitesi Eğitim Fakültesi Dergisi 25 (3): 524– 37. https://doi.org/10.17556/erziefd.1334195. Morse, Janice M., Michael Barrett, Maria Mayan, Karin Olson, and Jude Spiers. 2002. “Verification strategies for establishing reliability and validity in qualitative research.” International Journal of Qualitative Methods 1 (2): 13–22. https://doi.org/10.1177/160940690200100202. Naderifar, Mahin, Hamideh Goli, and Fereshteh Ghaljaie. 2017. “Snowball sampling: A purposeful method of sampling in qualitative research.” Strides in Development of Medical Education 14 (3). https://doi.org/10.5812/sdme.67670. 89ACADEMIC WRITING Naeem, Muhammad, Wilson Ozuem, Kerry Howell, and Silvia Ranfagni. 2023. “A step-by-step process of thematic analysis to develop a conceptual model in qualitative research.” International Journal of Qualitative Methods 22: 16094069231205789. https://doi.org/10.1177/16094069231205789. Nejja, Mohammed, and Abdellah Yousfi. 2015. “The context in automatic spell correction.” Procedia Computer Science 73: 109–14. https://doi.org/10.1016/j.procs.2015.12.055. Neto, Arthur Flor De Sousa, Byron Leite Dantas Bezerra, and Alejandro Héctor Toselli. 2020. “Towards the natural language processing as spelling correction for offline handwritten text recognition systems.” Applied Sciences 10 (21): 7711. https://doi.org/10.3390/app10217711. Omer Ismael, Kozhin, Kochar Ali Saeed, Airin Shwan Ibrahim, and Diya Shawkat Fatah. 2022. “Effects of auto-correction on students’ writing skill at three different universities in Sulaimaneyah City.” Arab World English Journal 8: 231–45. https://doi.org/10.24093/awej/call8.16. Pinet, Svetlana, and Nazbanou Nozari. 2022. “Correction without consciousness in complex tasks: Evidence from typing.” Journal of Cognition 5 (1): 11. https://doi.org/10.5334/joc.202. Prasangani, Kariyawasam Sittarage. 2018. “English language education in Sri Lanka Link with the learners’ motivational factors.” HLT Magazine, August. Putze, Felix, Maik Schünemann, Tanja Schultz, and Wolfgang Stuerzlinger. 2017. “Automatic classification of auto-correction errors in predictive text entry based on EEG and context information.” In Proceedings of the 19th ACM International Conference on Multimodal Interaction, 137–45. Association for Computing Machinery. https://doi.org/10.1145/3136755.3136784. Rahimi, Mehrak, Gholamreza Gholizadeh, and Ali Shahryari. 2019. “Iranian EFL learners’ perceptions about automatic spelling correction software use for learning English spellings: A study with focus on gender.” International Journal of English Language and Translation Studies 7 (1): 68–75. Ranalli, Jim, and Taichi Yamashita. 2022. “Automated written corrective feedback: Error correction performance and timing of delivery.” Language Learning & Technology 26 (1): 1–25. http:// hdl.handle.net/10125/73465. Reed, M. S., M. Ferré, J. Martin-Ortega, R. Blanche, R. Lawford-Rolfe, M. Dallimer, and J. Holden. 2021. “Evaluating impact from research: A methodological framework.” Research Policy 50 (4): 104147. https://doi.org/10.1016/j.respol.2020.104147. Ren, Simin, and Paul Seedhouse. 2024. “Doing language testing: Learner-initiated side sequences in a technology-mediated language learning environment.” Classroom Discourse 15 (4): 317–52. https:// doi.org/10.1080/19463014.2024.2305446. Rosairo, H. S. R. 2023. “Thematic analysis in qualitative research.” Journal of Agricultural Sciences – Sri Lanka 18 (3). https://doi.org/10.4038/jas.v18i3.10526. Rüdian, Leo Sylvio, Moritz Dittmeyer, and Niels Pinkwart. 2022. “Challenges of using auto-correction tools for language learning.” In LAK22: 12th International Learning Analytics and Knowledge Conference, 426–31. Association for Computing Machinery. https:// doi.org/10.1145/3506860.3506867. Salehi, Mohammad, and Ava Bahrami. 2018. “An error analysis of journal papers written by Persian authors.” Cogent Arts & Humanities 5 (1): 1537948. https://doi.org/10.1080/23311983.2018.1537948. Salton, G., and M. E. Lesk. 1968. “Computer evaluation of indexing and text processing.” Journal of the ACM 15 (1): 8–36. https://doi.org/10.1145/321439.321441. Sanchez, Avigail T., Kairille France R. Arcila, Jerime L. Baldomero, Kianna Mhae P. Cahanding, Rachelle Anne A. De Leon, and Catherine A. Samson. 2023. “Roles of auto-correction tools on HUMSS students’ writing skills.” Proceedings of International Interdisciplinary Conference on Sustainable Developments Goals (IICSDGs) 6 (1): 99–113. Sanosi, Abdulaziz B. 2022. “The impact of automated written corrective feedback on EFL learners’ academic writing accuracy.” Journal of Teaching English for Specific and Academic Purposes 10 (2): 301– 17. https://doi.org/10.22190/JTESAP2202301S. Sari, Elif, and Turgay Han. 2024. “The impact of automated writing evaluation on English as a foreign language learners’ writing self‐efficacy, self‐regulation, anxiety, and performance.” Journal of Computer Assisted Learning 40 (5): 2065–80. https://doi.org/10.1111/jcal.13004. 90 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features in Text-Processing Software on the Academic ... Saud, Jefriyanto, Lela Susanty, Petrus Jacob Pattiasina, Satriani, and Wajnah. 2023. “Exploring the influence of the environment on students’ second language acquisition: A comprehensive psycholinguistic study.” RETORIKA: Jurnal Ilmu Bahasa 9 (2): 174–84. https:// doi.org/10.55637/jr.9.2.7724.174-184. Schaefer, Kristin E., Jessie Y. C. Chen, James L. Szalma, and P. A. Hancock. 2016. “A meta-analysis of factors influencing the development of trust in automation: Implications for understanding autonomy in future systems.” Human Factors: The Journal of the Human Factors and Ergonomics Society 58 (3): 377–400. https://doi.org/10.1177/0018720816634228. Shadiev, Rustam, and Xun Wang. 2022. “A review of research on technology-supported language learning and 21st century skills.” Frontiers in Psychology 13: 897689. https:// doi.org/10.3389/fpsyg.2022.897689. Sherafati, Narjis, Farzad Mahmoudi Largani, and Shahrzad Amini. 2020. “Exploring the effect of computer-mediated teacher feedback on the writing achievement of Iranian EFL learners: Does motivation count?” Education and Information Technologies 25 (5): 4591–4613. https:// doi.org/10.1007/s10639-020-10177-5. Singh, Jitendra, and Tracy Eisenschenk. 2021. “A thematic analysis of the attitudes and perceptions of faculty towards inclusion of interprofessional education in healthcare curriculum.” International Journal of Health Sciences Education 8 (1). https://doi.org/10.59942/2325-9981.1117. Singh, Shashank, and Shailendra Singh. 2018. “Review of real-word error detection and correction methods in text documents.” In 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), 1076–81. IEEE. https://doi.org/10.1109/ICECA.2018.8474700. Stamatatos, Efstathios, Nikos Fakotakis, and George Kokkinakis. 2000. “Automatic text categorization in terms of genre and author.” Computational Linguistics 26 (4): 471–95. https:// doi.org/10.1162/089120100750105920. Steyn, Jacques, and Graeme Johanson. 2011. ICTs and Sustainable Solutions for the Digital Divide: Theory and Perspectives. Information Science Reference. Tica, Lena, and Ivana Krsmanović. 2024. “Overcoming the writer’s block? Exploring students’ motivation and perspectives on using ChatGPT as a writing assistance tool in ESP.” ELOPE: English Language Overseas Perspectives and Enquiries 21 (1): 129–49. https://doi.org/10.4312/elope.21.1.129-149. Toppelberg, Claudio O., and Brian A. Collins. 2010. “Language, culture, and adaptation in immigrant children.” Child and Adolescent Psychiatric Clinics of North America 19 (4): 697–717. https:// doi.org/10.1016/j.chc.2010.07.003. Truscott, John. 1999. “The case for ‘the case against grammar correction in L2 writing classes’: A response to Ferris.” Journal of Second Language Writing 8 (2): 111–22. https://doi.org/10.1016/S1060- 3743(99)80124-6. Van der Loo, Mark P. J., and Edwin de Jonge. 2020. “Data validation.” In Wiley StatsRef Statistics Reference Online. https://doi.org/10.1002/9781118445112.stat08255. Van Der Steen, Steffie, Dianne Samuelson, and Jennifer M. Thomson. 2017. “The effect of keyboard-based word processing on students with different working memory capacity during the process of academic writing.” Written Communication 34 (3): 280–305. https://doi.org/10.1177/0741088317714232. Van Waes, Luuk, Mariëlle Leijten, Jens Roeser, Thierry Olive, and Joachim Grabowski. 2021. “Measuring and assessing typing skills in writing research.” Journal of Writing Research 13 (1): 107–53. https:// doi.org/10.17239/jowr-2021.13.01.04. Vičič, Polona. 2020. “A fully integrated approach to blended language learning.” ELOPE: English Language Overseas Perspectives and Enquiries 17 (2): 219–38. https://doi.org/10.4312/elope.17.2.219-238. Vivek, Ramakrishnan, Yogarajah Nanthagopan, and Sarmatha Piriyatharshan. 2023. “Beyond methods: Theoretical underpinnings of triangulation in qualitative and multi-method studies.” SEEU Review 18 (2): 105–22. https://doi.org/10.2478/seeur-2023-0088. 91ACADEMIC WRITING Vodopija-Krstanović, Irena, and Maja Brala Vukanović. 2015. “Students of today changing English language studies of yesterday.” ELOPE: English Language Overseas Perspectives and Enquiries 12 (2): 175–89. https://doi.org/10.4312/elope.12.2.175-189. Wei, Ping, Xiaosai Wang, and Hui Dong. 2023. “The impact of automated writing evaluation on second language writing skills of Chinese EFL learners: A randomized controlled trial.” Frontiers in Psychology 14: 1249991. https://doi.org/10.3389/fpsyg.2023.1249991. Weigle, Sara Cushing. 2013. “English language learners and automated scoring of essays: Critical considerations.” Assessing Writing 18 (1): 85–99. https://doi.org/10.1016/j.asw.2012.10.006. Wood, Nicola. 2014. “Autocorrect awareness: Categorizing autocorrect changes and measuring authorial perceptions.” MA Thesis, Florida State University. Wu, Yanni, Michelle Howarth, Chunlan Zhou, Mingyu Hu, and Weilian Cong. 2019. “Reporting of ethical approval and informed consent in clinical research published in leading nursing journals: A retrospective observational study.” BMC Medical Ethics 20 (1): 94. https://doi.org/10.1186/s12910- 019-0431-5. 93ACADEMIC WRITING 2025, Vol. 22 (1), 93-109(228) journals.uni-lj.si/elope https://doi.org/10.4312/elope.22.1.93-109 UDC: [811.111’243:378(594)]:004.89 Tommy Hastomo, Andini Septama Sari, Utami Widiati, Francisca Maria Ivone, Evynurul Laily Zen State University of Malang, Indonesia Muhammad Fikri Nugraha Kholid Raden Intan State Islamic University Lampung, Indonesia Does Student Engagement with Chatbots Enhance English Proficiency? ABSTRACT This study examines how Indonesian university students’ engagement with chatbots influences their English proficiency. While AI tools are increasingly used in language education, little research focuses on chatbot interaction dynamics. The research assesses behavioural (active use), cognitive (perceived value), and emotional (attitudinal) engagement across 150 non-English majors at four proficiency levels (A1–B2). Data from engagement surveys and proficiency tests were analysed using ANOVA, correlation, and regression. Results indicated that higher- proficiency students (B1/B2) engaged more intensely with chatbots than their lower-level peers. Behavioural and cognitive engagement strongly correlated with improved language skills, while emotional engagement showed no significant link. Regression analysis identified behavioural and cognitive engagement as key predictors of proficiency gains, suggesting that active interaction and perceived utility of chatbots drive language development. The findings underscore chatbots’ potential as effective language-learning aids. Keywords: AI, chatbots, cognitive engagement, emotional engagement, English proficiency, Indonesian university students Ali uporaba pogovornih sistemov prispeva k izboljšanju znanja angleščine pri študentih in študentkah? IZVLEČEK Študija preučuje vpliv uporabe pogovornih sistemov na znanje angleščine pri indonezijskih študentih in študentkah. Čeprav se ta orodja vse pogosteje uporabljajo pri učenju jezikov, je raziskav o njihovi interakciji malo. Raziskava zajema vedenjsko (aktivna uporaba), kognitivno (zaznana koristnost) in čustveno (odnosno) vključenost pri 150 sodelujočih, ki ne študirajo angleščine, na štirih ravneh znanja jezika (A1–B2). Podatki iz anket in testov znanja so analizirani s pomočjo ANOVA ter korelacijske in regresijske analize. Rezultati so pokazali, da študenti in študentke z višjo ravnjo znanja (B1/B2) pogosteje in intenzivneje uporabljajo pogovorne sisteme. Vedenjska in kognitivna vključenost močno korelirata z izboljšanjem jezikovnih spretnosti, medtem ko čustvena vključenost nima pomembnega vpliva. Regresijska analiza je pokazala, da sta vedenjska in kognitivna vključenost ključna napovednika napredka v znanju angleščine, kar kaže, da sta aktivna uporaba in zaznana koristnost pogovornih sistemov glavna dejavnika pri učenju jezika. Ugotovitve potrjujejo potencial pogovornih sistemov kot učinkovitih učnih pripomočkov. Ključne besede: umetna inteligenca, pogovorni sistemi, kognitivna vključenost, čustvena vključenost, znanje angleškega jezika, indonezijski študenti 94 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ... 1 Introduction In Indonesia, English proficiency has become increasingly important as the country continues to engage with the global economy and participate in international collaborations. As global networks expand and the demand for skilled professionals rises, English becomes a key tool for communication and knowledge exchange (Ward and Given 2019). Universities and higher education institutions in Indonesia face the challenge of ensuring that their graduates are competitive on the world stage, which requires a strong command of the English language (Logli 2016). Graduates with advanced English proficiency are better positioned to secure employment, collaborate on international projects, and participate in global research initiatives. However, the level of English proficiency among Indonesian students often falls short of expectations, creating a significant gap in their ability to succeed in these global contexts. Consequently, developing practical educational tools and strategies to improve English proficiency has become a pressing priority for educators and policymakers in Indonesia. One innovative solution that has garnered attention is using Artificial Intelligence (AI)- powered chatbots to support language learning. These chatbots, engineered to replicate human-like dialogue, provide students with a dynamic and tailored educational experience that can be adjusted to individual requirements (Waziana et al. 2024). Chatbots can provide real-time feedback, guide students through language exercises, and offer practice in a safe, low-pressure environment. This feature proves especially advantageous for individuals who experience apprehension or reluctance when speaking English in group settings, as these tools create a supportive environment conducive to language practice. Current research highlights that AI-based conversational agents foster higher levels of student engagement and motivation by delivering adaptable, interactive learning experiences beyond those achievable in standard teacher-led environments (Alsawaier 2018; Huang, Hew, and Fryer 2022). Additionally, these chatbots can help students practice language skills at their own pace, promoting self-regulated learning and fostering deeper language acquisition (Chang et al. 2023). By providing immediate feedback, these tools support the iterative process of learning, which is essential for mastering a language. Nevertheless, while AI-powered conversational agents are increasingly integrated into educational frameworks, empirical investigation into their targeted effects on English language proficiency development remains notably limited. Although current research has investigated the broader pedagogical advantages of AI-driven technologies in learning environments (Slamet 2024; Waziana et al. 2024; Gayed et al. 2022; Nurchurifiani et al. 2025; Zulianti et al. 2024), very few empirical studies have focused on the correlation between learner interaction with AI conversational systems and measurable advancements in English linguistic proficiency. The effectiveness of these tools in enhancing learners’ English proficiency, especially in comparison to traditional methods of language instruction, remains an area that requires further investigation. Scholarly inquiries have examined the pedagogical applications of chatbots in stimulating learner motivation and enriching educational outcomes (Kim, Cha, and Kim 2021; Silitonga et al. 2023), but there is a gap in understanding how sustained engagement with chatbots leads to measurable improvement in language proficiency. Further empirical investigation is warranted to determine the efficacy T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ... 95ACADEMIC WRITING of chatbots in facilitating linguistic competence, especially in the context of non-English- speaking countries like Indonesia. Furthermore, the methodological frameworks and integration strategies underpinning AI chatbot interfaces within language acquisition pedagogy continue to undergo iterative refinement. While some chatbots have been designed to focus on speaking and writing practice, others offer more general support for reading and listening skills. The diversity of AI tools available on the market presents a challenge for educators and researchers trying to pinpoint which features and functions of these tools are most beneficial for improving language proficiency. Studies suggest that chatbots with personalized feedback mechanisms, goal- setting features, and adaptive learning paths are more likely to enhance student engagement and language development (Huang, Hew, and Fryer 2022; Chang et al. 2023). These findings underscore the necessity for specialized scholarly investigation into strategies for optimizing chatbots to facilitate discrete components of linguistic acquisition, including but not limited to syntactic accuracy, lexical expansion, phonological precision, and communicative fluency. Despite these promising developments, challenges remain in integrating chatbots into mainstream language education. Scholarly discourse has raised concerns that excessive dependence on technological systems within educational settings may inadvertently diminish the frequency and quality of direct interpersonal engagement among educators and fellow learners (Zou et al. 2023; Hastomo, Mandasari, and Widiati 2024). Moreover, while chatbots offer substantial utility in facilitating language practice, they are inherently limited in replicating the nuanced dynamics and depth inherent to human communicative exchanges. Consequently, educators retain an indispensable role in scaffolding learners’ linguistic development, despite the proliferation of sophisticated artificial intelligence applications. This necessitates a paradigm shift towards strategically incorporating these technologies into pedagogical frameworks to augment rather than supplant conventional instructional approaches. This research aims to address the gap in understanding the role of chatbots in enhancing English proficiency by investigating the relationship between Indonesian university students’ engagement with AI-powered chatbots and improvement in their English language skills. By focusing on Indonesian university students in an EFL context, this study seeks to contribute valuable insights into the potential of AI tools to address the English proficiency gap in Indonesia. Specifically, the study will examine how students’ engagement with chatbots correlates with improved English proficiency. By addressing these objectives, the research aims to advance scholarly discourse on AI’s transformative potential within educational paradigms, particularly its capacity to redefine language acquisition methodologies in digitally mediated environments. The investigation is structured around the following research questions: 1. How does the engagement of Indonesian university students with chatbots differ by English proficiency level? 2. How does the engagement of Indonesian university students with chatbots correlate with their English proficiency? 3. What predictive role does the engagement of Indonesian university students with chatbots play in their English proficiency? 96 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ... 2 Literature Review 2.1 Student Engagement in Language Learning Student engagement refers to students’ cognitive, emotional, and behavioural involvement in their learning activities. Cognitive engagement involves the mental effort students apply to understanding and integrating new information, while emotional engagement reflects students’ feelings about, and attitudes and motivations towards learning. Behavioural engagement is observable through active participation and consistent task effort (Al-Obaydi et al. 2023). Engagement in language acquisition processes is pivotal, as it facilitates knowledge retention, cultivates analytical reasoning, and strengthens problem-resolution capabilities among learners. The more engaged students are, the better they can grasp complex language concepts, such as grammar, vocabulary, and pronunciation. Active participation in classroom activities, whether speaking, listening, reading, or writing, fosters deeper learning. Moreover, engaged students tend to have a positive attitude toward language learning, which increases their persistence and resilience in overcoming challenges. Emotional engagement also affects students’ motivation, creating a sense of belonging and interest in the subject. Therefore, language educators must design engaging lessons that stimulate students’ intellectual curiosity and emotional connection to the language. By cultivating learner engagement, educators can establish an interactive educational setting that encourages proactive involvement, enabling students to collaboratively shape and invest in their academic development. Research indicates that increased student engagement leads to better language learning outcomes. For instance, when students actively participate in discussions, role-playing activities, or group projects, they are more likely to develop stronger communication skills and language fluency (Hastomo et al. 2024). Engagement enhances academic performance and students’ ability to interact effectively in real-life situations, making it essential for second language acquisition. Additionally, when students are emotionally engaged in the subject, they pursue language learning beyond the classroom, thus improving their overall proficiency. Behavioural engagement, such as practicing language skills outside class or seeking feedback, also supports ongoing improvement in language proficiency. Studies have shown that engaged students are more willing to use language learning tools and participate in extracurricular language-related activities, such as clubs or online forums. Consequently, the role of student engagement in language learning cannot be overstated, as it directly correlates with academic success and language mastery. Language teachers who integrate strategies to foster engagement, such as interactive activities and personalized learning experiences, contribute significantly to their students’ development (Moreira et al. 2018). Engaging students through diverse activities enhances their motivation and language acquisition process, and thus improving proficiency. One effective way to increase student engagement in language learning is by integrating technology, such as AI tools. These tools offer customizable instructional trajectories, thereby accommodating learners’ capacity to autonomously modulate their progression rates in alignment with current competency thresholds (Oktarin et al. 2024). These tools can provide immediate feedback and guidance, enhancing students’ cognitive engagement. Gamification and interactive features facilitate emotional engagement, which makes language learning 97ACADEMIC WRITING more enjoyable. Additionally, chatbots can adapt instructional materials in alignment with learner performance metrics, thereby sustaining an equilibrium between cognitive rigor and developmental feasibility. This adaptability encourages continuous engagement, as students can progress without feeling overwhelmed or bored. Thus, integrating AI technology, such as chatbots, into language learning offers exciting opportunities to improve student engagement. It provides a unique solution to cater for diverse learning styles and abilities, ensuring all students have the necessary resources and support to succeed. 2.2 Chatbots as a Language Learning Tool AI chatbots have become a prominent tool for language learning because of their accessibility, flexibility, and personalization capabilities. These tools serve as accessible, interactive modalities for autonomous language skill development beyond formal instructional environments. Contemporary platforms such as ChatGPT, Gemini, and Perplexity scaffold adaptive pedagogical frameworks, empowering learners to refine lexical acquisition, syntactic mastery, and compositional fluency through a self-regulated pace of progression (Waziana et al. 2024). These tools allow students to receive immediate feedback on their language production, which helps improve their writing and speaking proficiency. Chatbots are available 24/7, providing learners with consistent opportunities to practice without time or location limitations. This round-the-clock availability encourages continuous learning, making language practice more integrated into students’ daily lives. Additionally, chatbots support self-regulated learning by offering students a structured yet flexible learning environment. Through task-based activities, students can engage in meaningful conversations or writing exercises that align with their learning needs and goals. Therefore, chatbots contribute to an enhanced learning experience by offering a combination of accessibility, flexibility, and personalized feedback that supports language acquisition. One significant advantage of chatbots is their ability to personalize learning experiences. These tools can tailor exercises and tasks to suit individual learners’ proficiency levels and learning styles. For example, Duolingo adjusts the difficulty of exercises based on how well a student performs, ensuring that the learning experience remains challenging but not overwhelming (Sari, Hastomo, and Nurchurifiani 2023). ChatGPT, on the other hand, allows users to ask questions or engage in conversations in English, providing real-time, context-aware responses that help improve language skills (Slamet 2024). These personalized features increase students’ motivation by ensuring they can practice at a level appropriate for their abilities. Moreover, chatbots give students a sense of autonomy, as they can choose when and how to engage with the tool (Shikun et al. 2024). This autonomy enhances their emotional engagement by cultivating a perception of control over their educational trajectory. By providing immediate, personalized feedback, these tools also foster cognitive engagement, encouraging students to reflect on their mistakes and improve their language proficiency over time. As a result, chatbots are valuable tools that can enhance language learning by providing flexibility, personalization, and immediate feedback. Despite the advantages of AI chatbots, challenges remain regarding their integration into language learning. Studies have raised concerns about the limitations of AI in understanding 98 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ... complex human emotions or providing nuanced feedback (Casal and Kessler 2023; Rudolph, Tan, and Tan 2023; Thorp 2023; Baskara 2023). While AI can assist in grammar and vocabulary exercises, it may struggle with understanding the subtleties of conversational language, such as tone or cultural context. Furthermore, chatbots are not a replacement for human interaction, which is essential for developing communicative competence in language learning. Therefore, while AI tools like chatbots offer valuable assistance in practice, they should be used alongside traditional language instruction to provide a balanced learning experience. Additionally, learners may experience frustration if they encounter AI limitations, thus affecting their motivation and engagement. Educators must ensure that students are aware of these limitations and guide them in using chatbots effectively to complement their language learning. Therefore, chatbots have emerged as a transformative innovation in language pedagogy, delivering customized, readily accessible, and adaptable educational frameworks that substantially enhance learners’ linguistic competences. 2.3 Student Engagement and English Proficiency The Common European Framework of Reference for Languages (CEFR) is widely recognized as the principal standard for evaluating English language proficiency, stratifying linguistic ability into six sequential tiers: A1 (Basic User), A2 (Elementary), B1 (Intermediate), B2 (Upper Intermediate), C1 (Advanced), and C2 (Mastery). The CEFR tiers serve as indicators of learners’ receptive and productive competences in English, spanning from foundational to advanced mastery (Kim 2021). The A1 tier denotes foundational linguistic capabilities, whereas C2 approximates native-like mastery in both fluency and accuracy. By offering a systematic methodology for assessing communicative abilities, the CEFR enables instructors to pinpoint developmental needs and design targeted pedagogical interventions aligned with learners’ proficiency trajectories. The link between student engagement and English proficiency is well-documented, as engaged students are more likely to advance through these levels (Karabiyik 2019). When cognitively engaged in challenging tasks, students develop a deeper understanding and mastery of the language. Emotional engagement, fuelled by interest and motivation, encourages students to persist through complex tasks, improving their proficiency over time. Behavioural engagement, such as practicing English outside the classroom, further reinforces this development. Empirical research underscores a significant correlation between learner agency in educational processes and enhanced linguistic competence, as sustained cognitive engagement and deliberate practice catalyse the assimilation of target language structures (Hastomo and Septiyana 2022). Engagement in learning activities significantly contributes to language proficiency, especially when students actively participate in meaningful tasks. For example, students who engage in regular writing practice or group discussion tend to improve their communication skills and achieve higher proficiency levels (Yu, Jiang, and Zhou 2020). Task-based learning, where students are encouraged to use language in practical contexts, fosters deeper engagement and accelerates language acquisition. When students engage with authentic materials, such as news articles, movies, or social media content, they are exposed to real-world language use, contributing to more natural and functional language proficiency. Additionally, interaction with native speakers or engagement with AI-driven platforms such as chatbots offer learners 99ACADEMIC WRITING meaningful avenues to refine their oral and written communication skills through structured practice and contextual feedback. The more frequently students engage in these activities, the more likely they are to progress through the CEFR levels. Thus, student engagement is essential for improving language skills and achieving higher proficiency levels as measured by the CEFR framework. Research has also shown that engagement can influence the speed at which students progress through the CEFR levels. Research indicates that learners exhibiting consistent engagement and active participation in language acquisition demonstrate accelerated progression through proficiency tiers compared to peers with limited involvement (Tian and Zhou 2020; Shen et al. 2023). This is particularly evident in language proficiency exams, where students with higher engagement levels often score better. Additionally, integrating technology, such as chatbots, can further enhance student engagement and accelerate their progression through the CEFR levels. By offering personalized practice opportunities and immediate feedback, AI tools can help students improve their language skills more efficiently. Therefore, fostering student engagement is key to ensuring progress and success in language learning, as it directly influences the development of English proficiency across various CEFR levels. 3 Research Methodology This study aimed to examine and quantify the correlations between Indonesian university students’ engagement with chatbots and their English proficiency, using a quantitative research design. The research design is illustrated in Figure 1. where students with higher engagement levels often score better. Additionally, integrating technology, such as chatbots, can further enhance student engagement and accelerate their progression through the CEFR levels. By offering personalized practice opportunities and immediate feedback, AI tools can help students improve their language skills more efficiently. Therefore, fostering student engagement is key to ensuring progress and success in language learning, as it directly influences the development of English proficiency across various CEFR levels. 3 Research Methodology This study aimed to examine and quantify the correlations between Indonesian university students’ engagement with chatbots and their English proficiency, using a quantitative research design. The research design is illustrated in Figure 1. FIGURE 1. Research design. 3.1 Participants The participants in this study are university students from three public universities in Lampung (Sumatra), Malang (Java), and Pontianak (Kalimantan), representing three major islands in Indonesia. A total of 150 students from these institutions were selected for the study, all of whom met the inclusion criteria: they are undergraduate students who do not major in English and have varying levels of English proficiency as defined by the CEFR. The participants are categorized into four proficiency levels: A1 (beginner), A2 (elementary), B1 (intermediate), and B2 (upper- intermediate). Specifically, 72 students are at the A1 level (48.0%), 50 students at the A2 level (33.3%), 22 students at the B1 level (14.7%), and six students at the B2 level (4.0%). Demographically, the sample reflects regional diversity: Lampung represents a semi-urban area in southern Sumatra with moderate IT infrastructure; Malang is an urban educational hub in East Java with relatively advanced technological accessibility; and Pontianak is a city in West Kalimantan with developing IT resources. Including students across these geographically and socioeconomically distinct regions enables the study to account for variability in technological access, a critical factor in AI chatbot adoption. The students, being non-English majors, were chosen because their interaction with English was typically less frequent, providing valuable insight into how chatbots might influence students who are not primarily focused on language learning. By selecting this diverse sample spanning multiple Indonesian islands, the study aims to explore how engagement with chatbots can impact language development in a non-intensive English learning environment, while acknowledging contextual limitations in technological accessibility. 3.2 Instruments B1 Students’ engagement with AI chatbots English Proficiency Differences by English Proficiency B2 A A Figure 1. Research design. 3.1 Participants The participants in this study are university students from three public universities in Lampung (Sumatra), Malang (Java), and Pontianak (Kalimantan), representing three major islands in Indonesia. A total of 150 students from these institutions were selected for the study, all of whom met the inclusion criteria: they are undergraduate students who do not major in English and have varying levels of English proficiency as defined by the CEFR. The participants are categorized into four proficiency levels: A1 (beginner), A2 (elementary), B1 (int rmediate), and B2 (upp r-intermediate). Specifically, 72 students are at the A1 level (48.0%), 50 students at the A2 level (33.3%), 22 students at the B1 level (14.7%), and six students at the B2 level (4.0%). 100 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ... Demographically, the sample reflects regional diversity: Lampung represents a semi-urban area in southern Sumatra with moderate IT infrastructure; Malang is an urban educational hub in East Java with relatively advanced technological accessibility; and Pontianak is a city in West Kalimantan with developing IT resources. Including students across these geographically and socioeconomically distinct regions enables the study to account for variability in technological access, a critical factor in AI chatbot adoption. The students, being non-English majors, were chosen because their interaction with English was typically less frequent, providing valuable insight into how chatbots might influence students who are not primarily focused on language learning. By selecting this diverse sample spanning multiple Indonesian islands, the study aims to explore how engagement with chatbots can impact language development in a non-intensive English learning environment, while acknowledging contextual limitations in technological accessibility. 3.2 Instruments This study utilized two primary instruments for data collection: the Engagement Questionnaire and the English Proficiency Test. The Engagement Questionnaire was adapted from Xu and Li (2024) and aimed to evaluate the degree of student engagement in using chatbots for language learning, focusing on three critical aspects: behavioural, cognitive, and emotional engagement. Behavioural engagement was assessed by measuring the frequency and duration of chatbot usage, while cognitive engagement explored student perceptions of the chatbot’s usefulness in enhancing their learning process. Emotional engagement measured student motivation and satisfaction while interacting with the chatbot. The instrument utilized a 5-point Likert scale, with responses spanning from never (1) to always (5), allowing participants to systematically and measurably self-assess their engagement levels. The behavioural engagement section included questions such as, “How often do you use the AI chatbot to practice English?” and “How much time do you typically spend using the chatbot each week?” Cognitive engagement was measured through items like, “Do you find the chatbot helpful in improving your English skills?” and “Do you actively think about how to use the language while interacting with the chatbot?” Emotional engagement was gauged through prompts such as, “Using the chatbot motivates me to learn English” and “I feel satisfied after using the chatbot for language practice.” This multi-dimensional approach provided a comprehensive understanding of student engagement with the AI tool by capturing perspectives from different facets of interaction. The English Proficiency Test was designed to assess participants’ current proficiency levels based on the CEFR framework, covering four key language skills. Administered before the engagement questionnaire, this test established the baseline English proficiency of the participants. The results categorized students into A1, A2, B1, or B2 proficiency levels. The test ensured a holistic evaluation of each participant’s language ability by addressing all four language skills. 3.3 Data Collection The data collection process was carried out in two main phases: administering the English Proficiency Test and completing the Engagement Questionnaire. The English Proficiency Test 101ACADEMIC WRITING was conducted first to determine the participant’s current level of language proficiency. This test was distributed to all participants online, ensuring students could complete it quickly. The test was timed, and students were given a fixed duration to complete all sections. The results of this test were used to classify participants into one of the four CEFR levels (A1, A2, B1, or B2), which were crucial for analysing engagement differences based on proficiency level. Following the proficiency assessment, the Engagement Questionnaire was administered electronically via Google Forms to the same cohort. The instrument was structured to prioritize user-friendly design and accessibility, facilitating efficient digital completion. Participants were instructed to respond to items reflecting their recent utilization of chatbots for language acquisition purposes. The anonymous questionnaire encouraged honest responses and reduced social desirability bias. The data collected from the questionnaire provided insights into how students engage with chatbots and how their engagement levels relate to their language proficiency. Ethical protocols were rigorously integrated throughout the study’s methodological procedures. Written informed consent was secured from all participants prior to their commencement of the proficiency assessment and engagement survey. The consent documentation explicitly detailed the research objectives, participants’ right to withdraw without penalty, and assurances regarding the anonymity and secure handling of all collected data. The data was stored securely and used exclusively for research purposes. Personal information was kept anonymous to ensure the privacy of the participants. 3.4 Data Analysis This study employed quantitative analytical techniques, encompassing both descriptive and inferential approaches, to investigate potential correlations between learner engagement and English proficiency outcomes. Descriptive analyses were conducted to systematically summarize the dataset derived from the Engagement Questionnaire, with central tendency (mean) and dispersion (standard deviation) metrics calculated for each engagement dimension – behavioural, cognitive, and emotional – across the four CEFR proficiency tiers (A1–B2). These computations facilitated a comparative evaluation of aggregate engagement tendencies and variability patterns among learner subgroups. Specifically, the mean values elucidated average engagement intensity per proficiency tier, while standard deviation measures revealed intra-group heterogeneity in engagement patterns, thereby contextualizing the uniformity or divergence of learner experience within each cohort. A one-way analysis of variance (ANOVA) was performed to investigate statistically significant variations in engagement levels across distinct proficiency tiers. This statistical test compared the means of engagement scores (behavioural, cognitive, and emotional) across the four proficiency groups (A1, A2, B1, and B2). A significant result from the ANOVA would indicate that engagement patterns differ significantly between students at different proficiency levels. This analysis helped to determine if students at higher proficiency levels (B1, B2) were more or less engaged with chatbots compared to students at lower proficiency levels (A1, A2). 102 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ... A Pearson correlation coefficient analysis was conducted to assess the association between composite engagement metrics (behavioural, cognitive, emotional) and English proficiency scores. This analysis quantified both the magnitude and directional tendency of the relationship, with positive coefficients denoting proportional alignment between elevated engagement and higher proficiency outcomes, while negative values signified an inverse association. The results elucidated the extent to which engagement variables collectively influenced linguistic competency development. A multiple linear regression analysis was conducted to evaluate the predictive impact of learner engagement dimensions on English linguistic competency. Within this statistical framework, behavioural, cognitive, and emotional engagement metrics were operationalized as independent variables, while proficiency test results served as the dependent outcome measure. This analysis allowed for identifying the specific aspects of engagement that most strongly predict students’ language proficiency. By considering all three types of engagement simultaneously, the regression analysis provided a comprehensive understanding of how each form of engagement influences language proficiency. 4 Research Results 4.1 The Engagement of Indonesian University Students with Chatbots The analysis of Indonesian university students’ engagement with AI chatbots across English proficiency levels (A1, A2, B1, and B2) revealed notable patterns, as summarized in Table 1. Analytical findings revealed that learners across all CEFR tiers exhibited intermediate engagement levels with AI-driven conversational interfaces within language acquisition contexts. Table 1. Engagement with Chatbots across English proficiency levels. English Proficiency Level Mean SD Level of Engagement A1 3.21 0.45 Moderate A2 3.32 0.43 Moderate B1 3.45 0.40 Moderate B2 3.63 0.38 Moderate A1 3.21 0.45 Moderate When analysing specific dimensions of engagement, students at the A1 and A2 levels reported similar patterns of moderate behavioural and cognitive engagement, as shown in Table 2. Emotional engagement was reported as slightly lower for these levels than other dimensions, reflecting challenges in maintaining motivation and satisfaction during chatbot interactions. As illustrated in Table 3, learners at the B1 and B2 proficiency levels demonstrated elevated cognitive and behavioural engagement relative to their A1 and A2 counterparts, reflecting more profound immersion in and valuation of AI chatbot pedagogical interventions. Emotional engagement also improved, reflecting increased motivation and satisfaction among students with higher proficiency levels. 103ACADEMIC WRITING Table 2. Engagement dimensions for A1 and A2 students. Engagement Dimensions A1 Mean A1 SD Level of Engagement A2 Mean A2 SD Level of Engagement Engagement Dimensions Behavioural Engagement 3.28 0.47 Moderate 3.35 0.45 Moderate Behavioural Engagement Cognitive Engagement 3.15 0.50 Moderate 3.27 0.48 Moderate Cognitive Engagement Emotional Engagement 3.02 0.53 Moderate 3.13 0.51 Moderate Emotional Engagement Table 3. Engagement dimensions for B1 and B2 students. Engagement Dimensions B1 Mean B1 SD Level of Engagement B2 Mean B2 SD Level of Engagement Engagement Dimensions Behavioural Engagement 3.52 0.44 Moderate 3.68 0.41 Moderate Behavioural Engagement Cognitive Engagement 3.46 0.46 Moderate 3.60 0.43 Moderate Cognitive Engagement Emotional Engagement 3.32 0.50 Moderate 3.47 0.48 Moderate Emotional Engagement The engagement of Indonesian university students with chatbots varies across English proficiency levels. The One-Way ANOVA results revealed significant differences in overall engagement levels among the four proficiency groups (F(3, 706) = 6.15, p < 0.001). Post Hoc Tukey HSD tests showed that B2 students demonstrated significantly higher engagement levels than A1 (p = 0.001) and A2 (p = 0.005) students. Additionally, B1 students displayed higher engagement than A1 students (p = 0.028). These results suggest that students with greater English proficiency tend to engage more actively with chatbots, particularly in the cognitive and behavioural dimensions. Emotional engagement also improved incrementally with proficiency, highlighting the importance of personalized experiences in fostering satisfaction and motivation. 4.2 Correlation Between Indonesian University Students’ Engagement with Chatbots and English Proficiency The analysis results revealed significant and non-significant correlations between Indonesian university students’ engagement with AI chatbots and their English proficiency. The overall engagement score did not show a substantial relationship with English proficiency. However, when specific dimensions of engagement were examined, positive significant correlations were found between behavioural and cognitive engagement and English proficiency. Conversely, emotional engagement exhibited no significant correlation with English proficiency. 104 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ... Table 4. Results of Pearson’s correlation analyses. Behavioural Engagement Cognitive Engagement Emotional Engagement Overall English Proficiency r .142** .128** .046 p .001 .003 .143 Behavioural Engagement r 1 .593** .524** p .000 .000 Cognitive Engagement r 1 .486** p .000 Emotional Engagement r 1 p Overall r p *Significant at the 0.05 level (two-tailed). **Significant at the 0.01 level (two-tailed). The findings indicate that behavioural and cognitive engagement, which reflect active participation and the perceived usefulness of chatbots, are positively associated with the level of English proficiency. Emotional engagement, while important, did not exhibit a direct correlation, suggesting that motivation and satisfaction alone may not directly enhance proficiency. These results underscore the importance of fostering active and cognitively engaging interactions with chatbots to support language learning outcomes. 4.3 Predictive Roles of Indonesian University Students’ Engagement with Chatbots in English Proficiency Multiple linear regression was performed to predict the English proficiency levels of Indonesian university students based on their engagement with chatbots. As presented in Table 5, the results demonstrated that behavioural engagement, cognitive engagement, and Table 5. Results of regression analyses. Model Unstandardized Coefficients Standardized Coefficients t Sig. 95% Confidence Interval for B B Std. Error Beta Lower Bound Behavioural Engagement .145 .054 .098 2.687 .008 Cognitive Engagement .120 .048 .102 2.502 .013 Emotional Engagement .036 .051 .029 .705 .481 Overall Engagement .140 .050 .123 2.812 .005 105ACADEMIC WRITING overall engagement were significant predictors of students’ English proficiency. However, this analysis found emotional engagement had no meaningful predictive value for English proficiency. These findings suggest that behavioural engagement, cognitive engagement, and overall engagement with the AI chatbot enhance English proficiency. In contrast, emotional engagement alone does not show a significant impact. The regression analysis indicates that students who engage more frequently and intensely (both behaviourally and cognitively) with the chatbot are likely to have higher levels of English proficiency. In contrast, although important for motivation, emotional engagement did not emerge as a predictor in this context. The results emphasize the value of fostering more active and meaningful interactions with AI tools to improve language learning outcomes. 5 Discussion This study sought to explore Indonesian university students’ engagement with AI chatbots in language learning and how this engagement correlates with their English proficiency. The results yield critical insights into the pedagogical efficacy of AI-facilitated language acquisition technologies and their measurable impact on learner performance metrics. Three key points emerged from the data worth discussing: the relationship between engagement and proficiency, the differences in engagement across proficiency levels, and the predictive role of engagement in English proficiency. The study’s initial findings revealed moderate engagement levels among learners utilizing AI-assisted language learning tools, with this trend remaining consistent across all proficiency levels, which mirrors the findings of previous studies in similar contexts (Xu and Li 2024; Oktarin et al. 2024). These findings align with those from the existing literature suggesting that learners typically demonstrate moderate engagement in educational activities mediated by chatbots, a pattern observed consistently across varying levels of linguistic proficiency. However, the study found notable differences when examining engagement in specific activities, such as frequency and duration of chatbot use. Higher proficiency students (B1, B2) tended to engage more frequently and for more extended periods than their lower proficiency counterparts (A1, A2). This finding supports the idea that more proficient learners will likely find more value in these tools, possibly because of their ability to understand and apply the language more effectively in interactive scenarios. The findings suggest learners demonstrate intermediate engagement when utilizing chatbots for language acquisition, with this pattern persisting uniformly across all proficiency levels (Mageira et al. 2022). Second, regarding engagement and its correlation with English proficiency, the data indicate a significant positive relationship between learner interaction with AI-driven conversational platforms and advances in English linguistic competence. This aligns with existing scholarship emphasizing the efficacy of interactive pedagogical technologies in facilitating language acquisition (Yuan and Liu 2025). Notably, participants exhibiting elevated engagement metrics with these systems achieved superior performance outcomes on standardized proficiency measures. This suggests that engagement is a measure of the time spent interacting with learning tools and an indicator of active cognitive and emotional involvement in the learning process, which can enhance language skills. These results support the argument Table 5. Results of regression analyses. Model Unstandardized Coefficients Standardized Coefficients t Sig. 95% Confidence Interval for B B Std. Error Beta Lower Bound Behavioural Engagement .145 .054 .098 2.687 .008 Cognitive Engagement .120 .048 .102 2.502 .013 Emotional Engagement .036 .051 .029 .705 .481 Overall Engagement .140 .050 .123 2.812 .005 106 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ... made by Guo et al. (2023), who observed that engagement with educational technology tools, such as chatbots, can foster deeper learning and improve academic performance. Moreover, the study’s analysis revealed significant differences in engagement between proficiency levels. This was particularly noticeable in students’ more frequent and sustained interactions with the chatbots at higher proficiency levels. Students at the A1 and A2 levels tended to have more limited engagement with the chatbot, focusing primarily on basic tasks and responses. In contrast, students at the B1 and B2 levels were more likely to engage in complex conversations and tasks, suggesting they could leverage the chatbot tools for more advanced learning opportunities. Consistent with the previous research (Huang, Hew, and Fryer 2022), learners with advanced linguistic proficiency exhibit heightened efficacy in deploying educational technologies to cultivate lexical growth, grammatical accuracy, and holistic language mastery. The data also showed that engagement at the lower proficiency levels often involved more basic tasks, such as vocabulary drills or simple sentence construction, reflecting the learners’ limited linguistic ability. Lastly, when examining the predictive role of student engagement in English proficiency, the study found that engagement, particularly in frequency and duration of interaction with the chatbot, had a significant predictive role in determining English proficiency. The regression analysis results indicated that the more engaged students were with the AI chatbot, the higher their English proficiency tended to be. This is consistent with prior research on the predictive power of engagement in language learning (Alahmari and Alrabai 2024). Specifically, the study found that engagement with chatbots was a stronger predictor of English proficiency than other factors, such as traditional classroom instruction. This suggests that chatbots, with their interactive and personalized nature, provide an effective means of language practice that can lead to better proficiency over time. The study also found that engagement in emotionally and cognitively challenging tasks with the chatbot was particularly beneficial for improving language skills, as it encouraged deeper language processing (Schuetzler, Grimes, and Giboney 2020). 6 Conclusions This study investigated the engagement of non-English majors at three Indonesian universities with chatbots in language learning and its correlation with their English proficiency. The findings revealed that within this specific context, higher engagement levels, particularly in frequency and duration of interaction with the chatbot, were positively associated with higher English proficiency. Students with higher proficiency levels demonstrated more active and sustained engagement with the chatbots, reflecting their ability to leverage these tools for more advanced learning opportunities. However, these patterns may not extend to other student populations or educational settings. The study also highlighted the predictive role of student engagement in their English proficiency among the sampled population. It demonstrated that the frequency and intensity of engagement with chatbots could significantly predict improvements in English proficiency within this cohort. These results suggest that chatbots could serve as a valuable tool for enhancing language proficiency among similar student populations, particularly in 107ACADEMIC WRITING contexts with comparable demographics and learning environments. Educators in analogous settings might consider integrating AI-assisted learning tools into their curricula to foster engagement and improve language outcomes, though further research is needed to validate these implications for broader application. While this study offers valuable insights, there are some important limitations to consider. The small number of B2-level participants (n=6) makes it difficult to draw strong conclusions about this subgroup, particularly in statistical analyses like regression – though results for the A1, A2, and B1 groups remain reliable. Additionally, since the research was conducted at three universities in Indonesia, its findings may not fully apply to students in other regions or cultural contexts. Another limitation lies in the methodology: relying solely on surveys and proficiency tests might overlook the lived experiences of students interacting with chatbots. Incorporating qualitative approaches, such as interviews or classroom observations, could uncover richer details about their challenges and behaviours. Future studies should aim for larger, more diverse participant pools across all proficiency levels, blending quantitative and qualitative methods, and exploring how AI tools impact language learning over time. Understanding what drives student engagement with these technologies will also be essential for creating tailored, effective learning strategies. Acknowledgement The authors would like to express their sincere gratitude to the Center for Higher Education Funding and Assessment (PPAPT) and the Indonesia Endowment Fund for Education (LPDP) at the Ministry of Finance of the Republic of Indonesia for their support and funding of this research. References Alahmari, Arwa, and Fakieh Alrabai. 2024. “The predictive role of L2 learners’ resilience in language classroom engagement.” Frontiers in Education 9. https://doi.org/10.3389/feduc.2024.1502420. Al-Obaydi, Liqaa Habeb, Farzaneh Shakki, Ragad M. Tawafak, Marcel Pikhart, and Raed Latif Ugla. 2023. “What I know, what I want to know, what I learned: Activating EFL college students’ cognitive, behavioral, and emotional engagement through structured feedback in an online environment.” Frontiers in Psychology 13:1083673. https://doi.org/10.3389/fpsyg.2022.1083673. Alsawaier, Raed S. 2018. “The effect of gamification on motivation and engagement.” The International Journal of Information and Learning Technology 35 (1): 56–79. https://doi.org/10.1108/IJILT-02-2017 -0009. Baskara, FX Risang. 2023. “Integrating ChatGPT into EFL writing instruction: Benefits and challenges.” International Journal of Education and Learning 5 (1): 44–55. https://doi.org/10.31763/ijele.v5i1.858. Casal, J. Elliott, and Matt Kessler. 2023. “Can linguists distinguish between ChatGPT/AI and human writing?: A study of research ethics and academic publishing.” Research Methods in Applied Linguistics 2 (3): 100068. https://doi.org/10.1016/j.rmal.2023.100068. Chang, Daniel H., Michael Pin Chuan Lin, Shiva Hajian, and Quincy Q. Wang. 2023. “Educational design principles of using AI chatbot that supports self-regulated learning in education: Goal setting, feedback, and personalization.” Sustainability 15 (17): 12921. https://doi.org/10.3390/su151712921. Gayed, John Maurice, May Kristine Jonson Carlon, Angelu Mari Oriola, and Jeffrey S. Cross. 2022. “Exploring an AI-based writing assistant’s impact on English language learners.” Computers and Education: Artificial Intelligence 3:100055. https://doi.org/10.1016/J.CAEAI.2022.100055. Guo, Kai, Yuchun Zhong, Danling Li, and Samuel Kai Wah Chu. 2023. “Investigating students’ engagement in chatbot-supported classroom debates.” Interactive Learning Environments 31 (6): 1–17. https://doi.org/10.1080/10494820.2023.2207181. 108 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ... Hastomo, Tommy, Muhammad Fikri Nugraha Kholid, Pipit Muliyah, Linda Septiyana, and Widi Andewi. 2024. “Exploring how video conferencing impacts students’ cognitive, emotional, and behavioral engagement.” Journal of Educational Management and Instruction 4 (2): 213–25. https://doi.org/10.22 515/jemin.v4i2.9335. Hastomo, Tommy, Berlinda Mandasari, and Utami Widiati. 2024. “Scrutinizing Indonesian pre-service teachers’ technological knowledge in utilizing AI-powered tools.” Journal of Education and Learning (EduLearn) 18 (4): 1572–81. https://doi.org/10.11591/edulearn.v18i4.21644. Hastomo, Tommy, and Linda Septiyana. 2022. “The investigation of students’ engagement in online class during pandemic COVID-19.” Jurnal Penelitian Ilmu Pendidikan 15 (2). https://doi.org/10.21831/JP IPFIP.V15I2.49512. Huang, Weijiao, Khe Foon Hew, and Luke K. Fryer. 2022. “Chatbots for language learning – Are they really useful? A systematic review of chatbot‐supported language learning.” Journal of Computer Assisted Learning 38 (1): 237–57. https://doi.org/10.1111/jcal.12610. Karabiyik, Ceyhun. 2019. “The relationship between student engagement and tertiary level English language learners’ achievement.” International Online Journal of Education and Teaching 6 (2): 281–93. https://eric.ed.gov/?id=EJ1248494. Kim, Hea Suk, Yoonjung Cha, and Na Young Kim. 2021. “Effects of AI chatbots on EFL students’ communication skills.” Korean Journal of English Language and Linguistics 21: 712–34. https://doi.org /10.15738/kjell.21.202108.712. Kim, Susie. 2021. “Generalizability of CEFR criterial grammatical features in a Korean EFL corpus across A1, A2, B1, and B2 levels.” Language Assessment Quarterly 18 (3): 273–95. https://doi.org/10.1080 /15434303.2020.1855647. Logli, Chiara. 2016. “Higher education in Indonesia: Contemporary challenges in governance, access, and quality.” In The Palgrave Handbook of Asia Pacific Higher Education, edited by Christopher S. Collins, Molly N.N. Lee, John N. Hawkins and Deane E. Neubauer, 561–81. Palgrave Macmillan US. https://doi.org/10.1057/978-1-137-48739-1_37. Mageira, Kleopatra, Dimitra Pittou, Andreas Papasalouros, Konstantinos Kotis, Paraskevi Zangogianni, and Athanasios Daradoumis. 2022. “Educational AI chatbots for content and language integrated learning.” Applied Sciences 12 (7): 3239. https://doi.org/10.3390/app12073239. Moreira, Paulo A.S., Adelaide Dias, Carla Matias, Jorge Castro, Tânia Gaspar, and Joana Oliveira. 2018. “School effects on students’ engagement with school: Academic performance moderates the effect of school support for learning on students’ engagement.” Learning and Individual Differences 67 (October): 67–77. https://doi.org/10.1016/J.LINDIF.2018.07.007. Nurchurifiani, Eva, Aksendro Maximilian, Galuh Dwi Ajeng, Purna Wiratno, Tommy Hastomo, and Andri Wicaksono. 2025. “Leveraging AI-powered tools in academic writing and research: Insights from English faculty members in Indonesia.” International Journal of Information and Education Technology 15 (2): 312–22. https://doi.org/10.18178/ijiet.2025.15.2.2244. Oktarin, Irene Brainnita, Maria Edistianda Eka Saputri, Betty Magdalena, Tommy Hastomo, and Aksendro Maximilian. 2024. “Leveraging ChatGPT to enhance students’ writing skills, engagement, and feedback literacy.” Edelweiss Applied Science and Technology 8 (4): 2306–19. https://doi.org/10.552 14/25768484.v8i4.1600. Rudolph, Jürgen, Samson Tan, and Shannon Tan. 2023. “ChatGPT: Bullshit spewer or the end of traditional assessments in higher education?” Journal of Applied Learning & Teaching 6 (1): 342–63. https://doi.org/10.37074/jalt.2023.6.1.9. Sari, Lusi Purnama, Tommy Hastomo, and Eva Nurchurifiani. 2023. “Assessing the efficacy of Duolingo for acquiring English vocabulary skills: Experimental research.” Journal of English Teaching Applied Linguistics and Literatures 6 (2): 193–200. Schuetzler, Ryan M., G. Mark Grimes, and Justin Scott Giboney. 2020. “The impact of chatbot conversational skill on engagement and perceived humanness.” Journal of Management Information Systems 37 (3): 875–900. https://doi.org/10.1080/07421222.2020.1790204. Shen, Chen, Penghai Shi, Jirong Guo, Suyun Xu, and Jiwei Tian. 2023. “From process to product: Writing engagement and performance of EFL learners under computer-generated feedback instruction.” Frontiers in Psychology 14 (October):1–13. https://doi.org/10.3389/fpsyg.2023.1258286. 109ACADEMIC WRITING Shikun, Shan, Gevorg Grigoryan, Ning Huichun, and Hasmik Harutyunyan. 2024. “AI chatbots: Developing English language proficiency in EFL classroom.” Arab World English Journal 1 (1): 292– 305. https://doi.org/10.24093/awej/ChatGPT.20. Silitonga, Lusia Maryani, Santhy Hawanti, Feisal Aziez, Miftahul Furqon, Dodi Siraj Muamar Zain, Shelia Anjarani, and Ting Ting Wu. 2023. “The impact of AI chatbot-based learning on students’ motivation in English writing classroom.” In Innovative Technologies and Learning, 6th International Conference, ICITL 2023, Porto, Portugal, August 28–30, 2023, Proceedings, ICITL, edited by Yueh-Min Huang and Tánia Rocha, 542–49. Springer. https://doi.org/10.1007/978-3-031-40113-8_53. Slamet, Joko. 2024. “Potential of ChatGPT as a digital language learning assistant: EFL teachers’ and students’ perceptions.” Discover Artificial Intelligence 4 (1): 46. https://doi.org/10.1007/s44163-024 -00143-2. Thorp, H. Holden. 2023. “ChatGPT is fun, but not an author.” Science 379 (6630): 313. https://doi.org /10.1126/science.adg7879. Tian, Lili, and Yu Zhou. 2020. “Learner engagement with automated feedback, peer feedback and teacher feedback in an online EFL writing context.” System 91 (July):102247. https://doi.org/10.1016/j.syst em.2020.102247. Ward, Wesley S., and Lisa M. Given. 2019. “Assessing intercultural communication: Testing technology tools for information sharing in multinational research teams.” Journal of the Association for Information Science and Technology 70 (4): 338–50. https://doi.org/10.1002/asi.24159. Waziana, Winia, Widi Andewi, Tommy Hastomo, and Muhamad Hasbi. 2024. “Students’ perceptions about the impact of AI chatbots on their vocabulary and grammar in EFL writing.” Register Journal 17 (2): 328–62. https://doi.org/https://doi.org/10.18326/register.v17i2.352-382. Xu, Jinfen, and Juan Li. 2024. “Effects of AI affordances on student engagement in EFL classrooms: A structural equation modelling and latent profile analysis.” European Journal of Education 59 (4): e12808. https://doi.org/10.1111/ejed.12808. Yu, Shulin, Lianjiang Jiang, and Nan Zhou. 2020. “Investigating what feedback practices contribute to students’ writing motivation and engagement in Chinese EFL context: A large scale study.” Assessing Writing 44:100451. https://doi.org/10.1016/j.asw.2020.100451. Yuan, Lingjie, and Xiaojuan Liu. 2025. “The effect of artificial intelligence tools on EFL learners’ engagement, enjoyment, and motivation.” Computers in Human Behavior 162:108474. https://doi.org /10.1016/j.chb.2024.108474. Zou, Bin, Xin Guan, Yinghua Shao, and Peng Chen. 2023. “Supporting speaking practice by social network-based interaction in artificial intelligence (AI)-assisted language learning.” Sustainability 15 (4): 2872. https://doi.org/10.3390/su15042872. Zulianti, Hajjah, Hastuti Hastuti, Eva Nurchurifiani, Tommy Hastomo, Aksendro Maximilian, and Galuh Dwi Ajeng. 2024. “Enhancing novice EFL teachers’ competency in AI-powered tools through a TPACK-based professional development program.” World Journal of English Language 15 (3): 117. https://doi.org/10.5430/wjel.v15n3p117. Part IV English Language and Literature Teaching 113ENGLISH LANGUAGE AND LITERATURE TEACHING 2025, Vol. 22 (1), 113-131(228) journals.uni-lj.si/elope https://doi.org/10.4312/elope.22.1.113-131 UDC: [811.111+811.112.2’243:37.09.3]:004.89 Saša Jazbec, Bernarda Leva, Marta Licardo University of Maribor, Slovenia AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers of English and German ABSTRACT Artificial intelligence (AI) is a disruptor increasingly impacting foreign language learning and teaching. This paper explores the theoretical framework of AI, its application in foreign language teaching, and the question of whether AI is displacing foreign language teachers. The empirical part presents findings from a survey of English and German teachers (n = 112) in Slovenian primary and secondary schools regarding their views on AI in foreign language teaching. Statistical analysis reveals a constructively critical attitude towards AI among teachers, acknowledging its presence in and influence on teaching strategies, methods, and teacher roles but not perceiving it as a fundamental threat. Furthermore, statistical tests and correlations indicate no significant differences in attitude towards AI in the classroom based on whether they are English or German teachers or whether they work in primary or secondary schools. Keywords: AI, disruption, teaching English and German as a foreign language, challenges, problems UI je prišla in bo ostala: empirična raziskava o stališčih učiteljev in učiteljic angleščine in nemščine IZVLEČEK Umetna inteligenca (UI) je kot disrupcija močno posegla tudi v učenje in poučevanje tujega jezika. V prispevku najprej osvetlimo teoretski okvir pojmovanja UI, razpravljamo o UI pri pouku tujega jezika in se posvečamo tudi vprašanju, ali UI izpodrinja učitelje in učiteljice tujega jezika. V empiričnem delu predstavljamo izsledke raziskave, v kateri so svoja stališča o UI pri pouku tujega jezika izrazili učitelji in učiteljice angleščine in nemščine (n = 112) v osnovnih in srednjih šolah v Sloveniji. Statistična analiza podatkov anketiranih je pokazala, da so do UI konstruktivno kritični, da se zavedajo njene prisotnosti in da zelo vpliva na strategije, metode dela pri pouku in delo učiteljev in učiteljic, jih spreminja, a jih ne ogroža. S statističnimi testi in korelacijami pa smo ugotavljali tudi, da ni statistično pomembnih razlik med stališči anketiranih do UI pri pouku glede na to, ali učijo angleščino ali nemščino, niti ne, ali delajo v osnovni ali v srednji šoli. Ključne besede: UI, disrupcija, pouk angleščine in nemščine kot tujega jezika, izzivi, problemi 114 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ... 1 Introduction Elias inspires Carinthian students as a sports teacher in Austria or as an English teacher for students in Finland; Charlie supports students at primary schools in Switzerland in developing social skills and dealing with emotions; Pepper enjoys teaching pupils in a school in Serbia, etc. Elias, Charlie, and Pepper are obviously excellent, popular teachers but simultaneously humanoid robots. They function based on artificial intelligence (AI); they learn and teach what is being taught to them, and they use both factual knowledge and non-verbal communication. Watching students communicate with humanoid robots is fascinating but also frightening. Students are motivated; they listen; they are willing to imitate the robot in sports; they smile when their errors are corrected; they endeavour to be better and are happy when the robot praises them and is satisfied with their work or performance. The enthusiastic engagement of students with humanoid robots offers a compelling glimpse into the potential of AI to shape learning experiences. However, the increasing sophistication of AI in education presents both opportunities and challenges. While AI-supported tools have become essential, the disruptive nature of this technology requires that educators adapt and prepare for significant change. After such disruption, returning to the status quo ante is no longer possible. In the context of AI, we must accept it as a new reality in education – which is the focus of this article – and develop strategies and procedures that enable teachers and AI to work together harmoniously and optimally in the educational process. This paper aims to present the conceptual framework of artificial intelligence in foreign language teaching. It also shares selected findings from an empirical survey of foreign language teachers in Slovenia, i.e. teachers of English and German, regarding the use of AI in their classrooms. Finally, it explores potential differences in perspective between English and German teachers on this topic. Within the context of various dilemmas posed by the use of AI in education, this paper seeks, among other things, theoretical and empirical answers to the vital question of whether AI will ultimately replace the foreign language teacher. Comparable questions also formed the starting point for empirical research by the Vodafone Foundation in Germany. Their target audience, however, was not classroom teachers but citizens, or parents of school-age children. They conducted an interesting, topical, and representative study on AI in schools with 5,000 citizens and 500 parents of school-age children, with the meaningful title Expedition into the Unknown (Vodafone 2023, 1-24).1 Below, we summarise the most important findings. Analysis of the results reveals that slightly more than half the respondents (the study states a majority) believe that AI will significantly change the future of the classroom (54%). Although at the time of the survey, they were still sceptical about the use of AI in school, seeing it as a threat rather than an opportunity (57%), they also wanted AI to become part of the curriculum (55%). The study explains this seemingly paradoxical finding by saying that those who understand that AI (e.g., ChatGPT) will remain part of our lives want children to be ready for this challenge. Respondents also believe that developing 1 More than 5,000 German citizens aged 18+ and 500 parents with school-age children up to 18 years participated in the study. The empirical data was collected over three days, from 23 March 2023 to 25 March 2023, in an open online panel (Vodafone 2023). AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ... 115ENGLISH LANGUAGE AND LITERATURE TEACHING digital competences is primarily the responsibility of schools (77%) and only then of parents. Interestingly, the study also confirmed by two-thirds that the regulation of the use of AI in school should be determined at the school level and not, as is common in Germany for school regulation, at the level of the federal state (cf. Vodafone 2023). The answer to the question of whether artificial intelligence will replace “natural intelligence,” i.e. the teacher, in the future is not a dilemma for the respondents of the Vodafone study, as 90% of them do not think this will happen. Having explored the perspectives of citizens and parents on AI in schools, it is crucial to establish a clear understanding of what exactly this term means. The subsequent section will explore the definition and historical context of artificial intelligence. 2 Artificial Intelligence 2.1 Artificial Intelligence – A Conceptual Framework The term artificial intelligence was first used in 1956 by a group of experts at Harvard as part of the Summer Research Project on Artificial Intelligence (1956). The experts set themselves the goal of describing the learning process and the characteristics of intelligence in such detail that they could develop a machine that could simulate this process (Ramge 2018, 33). The term artificial intelligence has since been frequently used in publications addressing the Turing test. Several experts, including Kačič (2024), have explored the appropriateness of the term artificial intelligence. According to Kačič, drawing on definitions from The Britannica Dictionary, intelligence is defined as the ability to learn, understand and make judgements or opinions based on reason and the ability to cope with novel or tricky situations. The adjective artificial denotes a physical substitute with equivalent functionality to a natural counterpart (artificial hip, artificial knee, artificial tooth, etc.) and is used in various contexts, including technology and medicine (cf. Kačič 2024). Since artificial intelligence does not have the equivalent functionality of natural intelligence and since it learns but does not have the ability to judge, understand what it has learnt or have an opinion, Kačič proposes the term virtual intelligence. Despite the conceptual appropriateness of the term virtual intelligence, Kačič (2024; cf. also De Florio-Hansen 2020, 46)22 acknowledges that the term artificial intelligence is so deeply rooted and so widely used that it would be difficult or downright impossible to change. The well-known and often quoted technicist definition of AI was drafted by the OECD in 2023 and revised in 2024. “An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment” (OECD 2024). The European Parliament, however, more politely and tellingly, but also less precisely, has stated that “AI is the ability of a machine to display human-like capabilities such as reasoning, learning, planning and creativity” (European Parliament). 2 In addition to the term artificial intelligence, terms for the opposing type of intelligence, such as human intelligence, natural intelligence and non-AI, appear in professional publications. Interestingly, however, there is not the same unanimity in professional and popular circles when it comes to naming these intelligences as there is for artificial intelligence. 116 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ... These definitions, and many others, are unanimous in ascribing to a machine, system, or programme similar capabilities to those of a human being, i.e. thinking, reasoning, learning, and communicating. This understanding of AI will be the starting point for this paper. 2.2 Artificial Intelligence – A Development Framework Since 1956, the development of artificial intelligence, or the tools supported by artificial intelligence, has experienced exponential growth. Despite this rapid development, experts distinguish between AI Winter and AI Summer. During an AI Winter, progress continues, but AI receives less attention from both experts and the public. Conversely, an AI Summer is characterised by intense development, and AI is at the centre of the action and of both expert and non-expert discussions. Current predictions and analyses suggest we are in an extended AI Summer, with some even suggesting a “perpetual summer” (cf. Rubanau 2024). Experts categorise AI into weak and strong AI (e.g., Wong 2020; Miao et al. 2021). Weak AI refers to tools and systems that focus on and are highly successful at solving specific problems (e.g., language learning and translation tools such as Duolingo, Grammarly, and Duden Mentor) (Marr 2018, 21). Strong AI, also called superintelligence, aims to create systems of neural networks that mimic human brain function, including the interpretation of emotions, feelings, and context, and are capable of learning on their own. While numerous tools are powered by weak intelligence, those supported by strong intelligence are still evolving. It is the latter that have become cause for concern and fear; the pace of development is breakneck, while the development of control systems and systems to monitor their use is lagging far behind. This concern was highlighted by a widely publicised open letter in early 2023, signed by technology giants, calling for a six-month moratorium on the development of AI systems more powerful than ChatGPT. They argued that the development was too fast for legal certainty and that the risks to humans and humanity were too significant (Clarke 2023). Despite this call, the proposal for a moratorium has not been implemented, and development is proceeding at its own rapid pace, as evidenced by numerous publications and studies. Above all, AI is increasingly permeating and transforming the educational space, including foreign language learning. 3 Artificial Intelligence in Foreign Language Learning 3.1 Literature Review The number of publications on AI in and for education is growing exponentially. Experts from diverse fields, computer scientists, but also psychologists, philosophers, linguists, neuroscientists, economists, politicians, translators, etc., are writing about AI with the common goal of getting to know, understand and explore the potentials and limitations of AI as much as possible. At the time of writing, for example, 64,166 publications on artificial intelligence and education have been published in the University of Maribor’s electronic resources system in the last five years in English (search string: artificial intelligence and education), 195 in German (search string: künstliche Intelligenz und Bildung), and 21 in Slovene (search string: 117ENGLISH LANGUAGE AND LITERATURE TEACHING umetna inteligenca) (e.g., UM:NIK 2025). There has been a surge of publications reviewing and analysing AI research. In particular, the number of discussions increased when OpenAI released ChatGPT, a revolutionary application, freely available on 30 November 2022 (Hong 2023, 38). ChatGPT is a chatbot that can conduct a dialogue with an interlocutor in a convincing way and offers a wide range of possibilities that go beyond traditional pedagogical procedures (Baskara and Mukarto 2023). Although it does not understand the questions but generates answers according to the principles of frequency and relevance (Thorp 2023, 313), it has made particularly strong inroads into the (foreign) language learning and teaching process. Further possibilities and pitfalls of using ChatGPT in learning and teaching will not be discussed in this paper because of its limited scope (e.g., Hong 2023; Kasneci et al. 2023; Kartal 2023; Dolenc and Brumen 2024; Tica and Krsmanović 2024); ChatGPT will be considered as one of the AI tools in foreign language teaching. Among the contributions in Slovene that are of interest to the Slovene pedagogical area, we would like to highlight the following: 1) a monograph on contemporary perspectives on society and artificial intelligence (Bregant, Aberšek and Borstner 2022). It brings together high-profile scientific contributions, in which AI is interdisciplinarily and critically discussed from the perspectives of computer scientists, psychologists and educators; 2) a scientific monograph on the use of generative AI in education (Žerovnik and Zapušek 2024). It lays the theoretical groundwork for the innovative and practical use of AI in education, discusses the ethical aspects of the use of AI, and identifies guidelines for the integration of generative AI in education (Žerovnik and Zapušek 2024). In addition to the above, there is a vast number of master’s and bachelor’s theses, as well as lectures, seminars, forums, and portals where users can learn about the practical possibilities of using tools, most often ChatGPT, in school. It is up to each individual to consider the quality, professionalism, criticality, accountability, and marketing interests of these tools. However, the number of resources in English that are also interesting to the Slovenian pedagogical area is almost innumerable. We would like to highlight two key sources: First, an interdisciplinary scientific monograph by Licardo and Lipovec (2024) that explores the intersection of AI literacy and social-emotional skills within the educational context. The contributions in this monograph are empirical studies conducted in Slovenia that address the technical aspects of AI, and its ethical dimensions, while also providing a deeper insight into social-emotional learning. The main purpose of the studies is to show, in a theoretically grounded and empirically supported way, how AI and social-emotional skills, as transversal competences, can be developed and integrated into educational frameworks. The second key source is a scientific paper by Dolenc and Brumen (2024) that focuses on foreign language teaching and investigates social and computer science students’ perceptions of the integration and use of AI-based technologies in education. The empirical results highlighted an interesting aspect that is not often discussed in the context of AI and education, i.e. the importance of the gender and discipline of the teacher to the introduction of AI 118 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ... in education. Students in social sciences and women are generally less inclined to use AI tools in foreign language education, often expressing doubts about their ability to enhance academic performance. These groups tend to be more critical of or cautious about the role AI plays in language learning. While they acknowledge that AI can be a useful tool to enrich the learning process, they also emphasise the irreplaceable value of human teachers in education. This empirical research is particularly relevant for the development of guidelines for teacher education, which usually do not consider the importance of gender and the professional profile of the teacher. There are also many papers on the question of whether AI will replace the teacher (e.g., Chan and Tsi 2024; Bouras 2024; Pettersson et al. 2024; Knaus 2024). As a point of interest, we summarise an excursus by Knaus in which he reflects from an educator’s perspective on whether teachers are still needed in the world of AI. Knaus (2024) believes that the answer to the question is a dystopian vision that runs like a thread through the history of media. As soon as a technical innovation has potential similar to a teacher’s, there is talk that it may displace them. Thus, at the beginning of the book, Knaus reports that innovation was once credited with breaking down the teacher’s “information monopoly.” School television, programmed learning, language labs, Virtual Learning Environments (VLEs), Personal Learning Environments (PLEs) or Massive Open Online Courses (MOOCs) could also be labelled as attempts at an educational revolution, each aiming to distribute information more widely and potentially displace the teacher. Knaus believes that, despite AI systems, which are undoubtedly excellent, this will not happen because the learning process is not only about interaction and communication of knowledge (which AI can do) but also about relationships, the development of individuals, enculturation, social integration, and social competences, which can only be developed if one is in society/in contact with human beings/ people (cf. Knaus 2024, 20–21). 3.2 Challenges and Problems of Using Artificial Intelligence in Foreign Language Teaching Traditionally, learning and teaching foreign languages has been done with the help of ICT. Most foreign language teachers are familiar with ICT and use it regularly in their work. In the professional literature, this type of learning and teaching is called Computer Assisted Language Learning (CALL) or Mobile Assisted Language Learning (MALL). However, with developments in natural language processing, advances in deep and networked learning, and the increasing technological ability to handle big data, Intelligent Computer-Assisted Language Learning (ICALL) has evolved. On the one hand, Intelligent Computer-Assisted Language Learning systems have brought about a fundamental qualitative change in student-computer interaction (Kannan and Munday 2018); on the other hand, they have severely disrupted existing pedagogical formats of foreign language learning and teaching. Alongside this relativisation of existing pedagogical formats, ICALL has also sparked a series of controversial debates and reflections on the necessity and reasonableness of using AI for learning and teaching, as well as on the dangers and disruptive changes that its imminent use seems to imply (e.g., Strasser 2020; Dargan 2019; Renz et al. 2020). 119ENGLISH LANGUAGE AND LITERATURE TEACHING The biggest problems, fears, and legitimate dangers of AI in foreign language learning and teaching faced by teachers, decision-makers, students, and parents revolve around several key questions. These include the role of both the foreign language teacher and the learner in the new concepts of AI-assisted learning; issues of authorship, ethics, and copyright; issues of personal data protection and regulation of AI use; issues of the goals and competences to be developed in foreign language teaching, knowledge, testing, etc. Tica and Krsmanović (2024) address these concerns by emphasizing student apprehensions about ChatGPT’s limitations. Students often worry that such tools may not effectively cultivate deep linguistic competence or critical thinking. Moreover, fears of plagiarism, diminished originality and shallow engagement with learning materials make some students reluctant to rely on AI. These concerns suggest that AI should complement rather than replace traditional teaching methods, serving as a supportive resource rather than a primary instructional tool. Despite the intense debates in this area, systems are far too complex for us to expect answers soon or even in step with technological developments. This is particularly true in the field of education, where change is extremely slow, and the gap between technological development and realised change at the implementation level is the greatest. Also, the media habitus of teachers (and decision-makers) lags far behind media developments (cf. Hartmann 2021; Burow 2022). Beyond the challenges and problems, it should be emphasised, and the expert community agrees, that AI will not (or will not for some time) replace the teacher and traditional learning and teaching formats, but it will change and complement them (cf. Renz, Krishnaraja and Gronau 2020; Hartmann 2023).33 4 Artificial Intelligence in Foreign Language Teaching in Slovenia – Findings from Empirical Research In the empirical part, we present the views of foreign language (English and German) teachers in Slovenia on the use of artificial intelligence in foreign language learning and teaching. We start from the thesis that foreign language teachers in Slovenia are mostly hesitant towards the use of AI, that they do not consider AI to offer serious competition for them in the future, and that there is no difference in the views of English and German teachers (cf. Jazbec 2024). The research questions guided our analysis and were answered using a survey questionnaire. The data and the analysis of the results contribute to the quantitative analysis and interpretation of the research questions. While this study provides a quantitative overview, in-depth analyses of teachers’ attitudes, experiences, and practices, including the nuances of their perspectives, would require qualitative research and interpretations of the data collected and the theoretical starting points. At the outset, it is essential to acknowledge that the analysed data presented must be read and understood within the context of our predefined limitations. Several limitations should be considered when interpreting the findings of this study. The sample was non-random, consisting of teachers who chose to participate. This could introduce selection bias, as those who are more interested in or favourable towards AI may have been more likely to respond. 3 Bill Gates made a similar point: “AI will never replace teachers, but it is going to revolutionise teaching & learning” Gates (ASU&GSV conference, San Diego 2023). 120 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ... Owing to the non-random sampling method, the results of the study cannot be generalised to all foreign language teachers in Slovenia or to other contexts. Since the survey was self- administered and anonymous, there is a potential for response bias. Teachers may have provided responses they perceived as more socially acceptable or favourable regarding their professional use of AI, such as overestimating their current use or expressing more positive attitudes than they genuinely hold. 4.1 Method The purpose of the study is to gain insight into the views and beliefs of English and/or German language teachers on the use of artificial intelligence in foreign language teaching. The research questions were as follows: • What are the beliefs of foreign language teachers on the use of AI in the future? Do they perceive AI as an opportunity or a threat? Are there significant differences in opinions between English and German language teachers on whether AI is an advantage or a threat in the classroom? • What are the views of foreign language teachers on the role of the teacher in AI use and the impact of AI use on learning? Are there differences between German and English language teachers on these issues? • What are the correlations between beliefs about the potential of AI to improve teaching in schools, beliefs about the possibility that AI will not completely replace foreign language teachers in the future, and perceptions of the effects of AI on positive changes in student learning habits? • Do teachers know if their students use AI for learning, and is there a difference between teachers in primary school and teachers in upper secondary school on this question? 4.2 Participants and Data Collection The survey involved 112 foreign language teachers, including 46 German teachers, 41 English teachers, 19 teachers of both English and German, and 6 teachers of other languages or subject areas. Of the teachers in the sample, 44% teach at primary schools, 51% at high schools, and 5% elsewhere. Most teachers have 21 to 30 years of work experience (36%), followed by teachers with up to 10 years of experience (29%), then 11 to 20 years (26%), and the smallest proportion have 31 to 40 years of experience (9%). It can be concluded that the study involved experienced teachers, as two-thirds of the surveyed teachers have ten or more years of work experience. It is a non-random sample, and generalisation of the results is not possible. The profile of the respondents closely mirrors the overall population of foreign language teachers in Slovenia (Eurydice 2021/2022): half or a comparable percentage are employed in primary and high schools, and the languages German and English are equally represented in terms of the teacher profile. Also, most surveyed teachers have at least ten years of experience working in schools. Data were collected through a survey that was published on the online survey platform 1ka portal. Respondents could fill out the survey from May 2023 to August 2023. In the 121ENGLISH LANGUAGE AND LITERATURE TEACHING survey, they consented to the collection of data and the publication of results. The survey is anonymous, and the data are processed at the group level. 4.3 The Instrument This study employed a survey instrument designed to collect anonymous data, ensuring confidentiality and privacy. The instrument is designed to assess the attitudes and experiences of foreign language teachers concerning AI in education, comparing these to the broader teacher population in Slovenia. This allows for a detailed exploration of how AI is viewed within the educational context by those directly impacted by its integration. It includes 13 questions and utilises 51 variables to gather comprehensive insights. The questions cover a range of topics, including the current and potential future role of artificial intelligence in teaching, teachers’ perceptions of AI as an opportunity or threat, and the practical uses of AI in educational settings. The response options across the questions include Likert-type scales (e.g., strongly agree to strongly disagree), dichotomous choices (e.g., yes or no), and multiple-choice questions where respondents can select more than one answer. Specific questions explore the integration of AI in school, lesson planning, and the evaluation of student performance. 4.4 Analysis Results and Interpretation The analysis was conducted using descriptive and inferential statistics in SPSS. To analyse differences between the teachers of English and German language, as well as between primary and upper secondary school teachers, we used the t-test. For analysing correlations between individual variables, we used Pearson’s correlation coefficient. 4.4.1 Using AI in Teaching: Opportunity or Threat?44 4.4.1 Using AI in Teaching: Opportunity or Threat?4 FIGURE 1. Percentage (f %) of teachers’ responses on whether AI will significantly change teaching in the future. FIGURE 2. Percentage (f %) of teachers’ responses on how they perceive the possibilities of using AI in teaching. Figure 1 reveals that more than 80% of teachers believe that artificial intelligence will significantly change teaching in the future, while only 14% think this will not happen. Notably, German language teachers hold even stronger positive convictions (“absolutely yes”) than their English language counterparts. Figure 2 demonstrates that the foreign language teachers in Slovenia are not hesitant about using AI in schools. Specifically, 59.1% of German teachers and 65.9% of English teachers view AI in schools as an opportunity or a significant opportunity, while 38.6% of German teachers and 26.9% of English teachers perceive the use of AI in schools more as a threat or an absolute threat. is group comparison suggests that German language teachers are more inclined to see AI as a potential threat compared to English language teachers. A comparative analysis of the data with the results of the Vodafone study (Vodafone 2023), where more than half the respondents (57%) saw AI more as a threat than an opportunity, reveals significantly different attitudes among foreign language teachers in Slovenia compared to the attitudes of parents in Germany. We can only hypothesise that the observed difference stems from foreign language teachers’ greater familiarity and experience with ICT tools compared to the parents surveyed in the Vodafone study (2023). Teachers have already recognised and tested 4 This question and the data in Figures 1 and 2, previously published and discussed for the whole sample in Jazbec (2024), are presented here at the teacher group level. This serves as a foundation for our focus on differences between German and English teachers, the role of the teacher, and a comparison with the Vodafone study, all in the context of the original question. 30.4 41.3 21.7 0.0 6.5 24.4 51.2 17.1 2.4 4.9 0.0 10.0 20.0 30.0 40.0 50.0 60.0 Absolutely yes Yes Rather not Absolutely not I don't know English teachers German teachers 9.1 50.0 31.8 6.8 2.3 4.9 61.0 22.0 4.9 7.3 0.0 50.0 60.0 70.0 Absolutely as opportunity More as opportunity More as threat Abolutely as threat I don't know 10.0 20.0 English teachers 30.0 40.0 German teachers Figure 1. Percentage (f%) of teachers’ responses on whether AI will significantly change teaching in the future. 4 This question and the data in Figures 1 and 2, previously published and discussed for the whole sample in Jazbec (2024), are presented here at the teacher group level. This serves as a foundation for our focus on differences between German and English teachers, the role of the teacher, and a comparison with the Vodafone study, all in the context of the original question. 122 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ... 4.4.1 Using AI in Teaching: Opportunity or Threat?4 FIGURE 1. Percentage (f %) of teachers’ responses on whether AI will significantly change teaching in the future. FIGURE 2. Percentage (f %) of teachers’ responses on how they perceive the possibilities of using AI in teaching. Figure 1 reveals that more than 80% of teachers believe that artificial intelligence will significantly change teaching in the future, while only 14% think this will not happen. Notably, German language teachers hold even stronger positive convictions (“absolutely yes”) than their English language counterparts. Figure 2 demonstrates that the foreign language teachers in Slovenia are not hesitant about using AI in schools. Specifically, 59.1% of German teachers and 65.9% of English teachers view AI in schools as an opportunity or a significant opportunity, while 38.6% of German teachers and 26.9% of English teachers perceive the use of AI in schools more as a threat or an absolute threat. is group comparison suggests that German language teachers are more inclined to see AI as a potential threat compared to English language teachers. A comparative analysis of the data with the results of the Vodafone study (Vodafone 2023), where more than half the respondents (57%) saw AI more as a threat than an opportunity, reveals significantly different attitudes among foreign language teachers in Slovenia compared to the attitudes of parents in Germany. We can only hypothesise that the observed difference stems from foreign language teachers’ greater familiarity and experience with ICT tools compared to the parents surveyed in the Vodafone study (2023). Teachers have already recognised and tested 4 This question and the data in Figures 1 and 2, previously published and discussed for the whole sample in Jazbec (2024), are presented here at the teacher group level. This serves as a foundation for our focus on differences between German and English teachers, the role of the teacher, and a comparison with the Vodafone study, all in the context of the original question. 30.4 41.3 21.7 0.0 6.5 24.4 51.2 17.1 2.4 4.9 0.0 10.0 20.0 30.0 40.0 50.0 60.0 Absolutely yes Yes Rather not Absolutely not I don't know English teachers German teachers 9.1 50.0 31.8 6.8 2.3 4.9 61.0 22.0 4.9 7.3 0.0 50.0 60.0 70.0 Absolutely as opportunity More as opportunity More as threat Abolutely as threat I don't know 10.0 20.0 English teachers 30.0 40.0 German teachers Figure 2. P rcentage (f%) of teachers’ responses on how th y perceiv the possibili ies of using AI in teaching. Figure 1 reveals that more than 80% of teachers believe that artificial intelligence will significantly change teaching in the future, while only 14% think this will not happen. Notably, German language teachers hold even stronger positive convictions (“absolutely yes”) than their English language counterparts. Figure 2 demonstrates that the foreign language teachers in Slovenia are not hesitant about using AI in schools. Specifically, 59.1% of German teachers and 65.9% of E lish teach rs vi w AI in schools as an opportunity or a significant opportunity, while 38.6% of German teachers and 26.9% of English teachers perceive the use of AI in schools more as a t reat or an absolute threat. This group comparison sug ests that German l nguage teachers are more inclined to se AI as a potential threat compar d to English language teachers. A comparative analysis of the data with the results of the Vodafone study (Vodafone 2023), where more than half the respondents (57%) saw AI more as a threat than an opportunity, reveals significantly different attitudes among foreign language teache s in Sl veni compared to the attitudes of parents in Germany. We can only hypothesise that the observed difference stems from foreign language teachers’ greater familiarity and experience with ICT tools compared to the parents surveyed in the Vodafone study (2023). Teachers have already recognised and tested the benefits of using AI and have certainly also encountered the pitfalls of AI use (e.g., written assignments as homework in foreign language teaching, etc. Additionally, it is essential to consider in the analysis of survey results that the respondents were teachers who are familiar with AI, think about it, and engage with it. Results in Table 1 indicate that teachers of English, based on the average response to the statement regarding whether they see the use of AI in schools as an opportunity, express slightly more favour towards the idea that AI is an opportunity (M = 2.50; SD = 0.98), compared to teachers of German (M = 2.34; SD = 0.85). However, there are no statistically significant differences between teachers of German and teachers of English (t(80) = -0.33; p = 0.35). This lack of significant difference is unexpected, given that most current AI tools and training data are primarily in English. One might hypothesize that this would lead English teachers to perceive AI as more readily applicable and a more significant opportunity. This finding aligns with comparative studies of AI in foreign language teaching, which often do 123ENGLISH LANGUAGE AND LITERATURE TEACHING not distinguish between target languages, or focus primarily on English (e.g. Yuan 2024; Du and Daniel 2024). 4.4.2 Using AI in Teaching: Perspectives on the Role of the Teacher and Its Impact on Students’ Learning Habits55 FIGURE 3. Percentage (f %) of teachers' responses on whether AI could, under certain conditions, provide better instruction in schools than teachers (with natural intelligence). FIGURE 4. Percentage (f %) of teachers' responses on whether AI will not completely replace teachers in the future. The data in Figures 3 and 4 provide insight into the perspectives of English and German teachers regarding the role and potential of artificial intelligence (AI) in educational contexts. Regarding the potential capability of AI to deliver superior teaching under certain conditions, it is evident that teachers across both language groups display considerable scepticism. Overall, 68.2% of foreign language teachers surveyed disagree or strongly disagree with the assertion that AI could surpass human teachers in instructional effectiveness. Conversely, a quarter (25.4%) of respondents across both groups recognise that, under specific circumstances, AI could indeed outperform human teachers. While this highlights a cautious acknowledgement of AI’s instructional potential, the majority viewpoint clearly favours human teaching competences. When considering the possibility of complete replacement of teachers by AI in the future (Figure 4), there is an even stronger consensus across both teacher groups. An overwhelming 85.5% of teachers reject the notion of full AI replacement of human teachers, underscoring widespread confidence in the irreplaceability of human educators. Comparatively, these results align closely across both English and German teacher groups, illustrating a shared perception among language educators. Both groups express strong reservations about AI fully replacing human instructors, yet both cautiously acknowledge AI’s supplementary role, contingent upon specific educational conditions. This comparative analysis underlines that perspectives concerning AI’s instructional role appear 2.2 28.3 43.5 21.7 4.3 2.5 25.0 32.5 35.0 5.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 Strongly agree Agree Disagree Strongly disagree Don't know English teachers German teachers 39.1 43.5 8.7 4.3 4.3 58.5 34.1 4.9 2.4 0.0 0.0 10.0 30.0 50.0 60.0 70.0 Strongly agree Agree Disagree Strongly disagree Don't know 20.0 English teachers 40.0 German teachers igure 3. P rcentage (f%) of teachers’ responses on whether AI ould, under certain conditions, provide better in truction in schools than achers (with natural intelligence). The data in Figures 3 and 4 provide insight into the perspectives of English and German teachers regarding the role and potential of artificial intelligence (AI) in educational contexts. Regarding th potential capability of AI to deliver superior teaching under certain conditions, it is evident that teachers across both language groups display considerable scepticism. Overall, 68.2% of foreign language teachers surveyed disagree or strongly disagree with the assertion that AI could surpass human teachers in instructional effectiveness. Conversely, a quarter (25.4%) of respondents across both groups recognise that, under specific circumstances, AI 5 The data in Figures 3 and 4 were published in Jazbec (2024) at the level of the whole sample. They are presented here at the level of the groups because they are a starting point for analysing differences in attitudes towards AI - a factor with the potential to reshape classrooms and eve the role of German o English teachers, which is the central focus of this paper. Table 1. The t-test for differences between teachers of English and German language in attitudes about whether the use of AI in foreign language teaching represents an opportunity. Variable Numerus Mean St. Deviation Levene test t-test N M SD F P t (df) P I perceive AI and its applications in schools more as an opportunity than a threat. Teachers GEM 44 2.43 0.85 0.35 0.55 -0.33(80) 0.36Teachers ENG 38 2.50 0.98 124 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ... could indeed outperform human teachers. While this highlights a cautious acknowledgement of AI’s instructional potential, the majority viewpoint clearly favours human teaching competences. When considering the possibility of complete replacement of teachers by AI in the future (Figure 4), there is an even stronger consensus across both teacher groups. An overwhelming 85.5% of teachers reject the notion of full AI replacement of human teachers, underscoring widespread confidence in the irreplaceability of human educators. Comparatively, these results align closely across both English and German teacher groups, illustrating a shared perception among language educators. Both groups express strong reservations about AI fully replacing human instructors, yet both cautiously acknowledge AI’s supplementary role, contingent upon specific educational conditions. This comparative analysis underlines that perspectives concerning AI’s instructional role appear remarkably consistent. Such unanimity may facilitate future international collaborative efforts aimed at responsibly integrating AI technologies into language education. The results in Table 2 indicate no significant differences in opinion on whether AI could, under certain conditions, provide better instruction than teachers. For teachers of German (GEM), the mean response was 2.98 (SD = 0.88). For teachers of English (ENG), the mean was 3.16 (SD = 0.96), t(80) = -0.91, p = .18, suggesting a consensus that AI might not entirely outperform teacher’s instruction under existing conditions. Concerning the opinion that AI will not completely replace teachers in the future, significant differences occurred between the two groups. Teachers of German reported a mean of 1.91 (SD = 1.03), indicating more scepticism about AI replacing teachers, whereas teachers of English reported a more optimistic viewpoint with a mean of 1.47 (SD = 0.69), t(82) = 2.25, p = .01. This suggests that teachers of German language are more likely to believe that AI will not fully replace human teachers. However, the value of standard deviation in the group of German teachers is more than one, so we should interpret these results with caution. Finally, attitudes towards AI’s potential impact on student learning habits also showed no significant difference; however, the responses leaned towards a more positive view of English teachers. Teachers of German averaged 2.83 (SD = 1.06), while teachers of English averaged FIGURE 3. Percentage (f %) of teachers' responses on whether AI could, under certain conditions, provide better instruction in schools than teachers (with natural intelligence). FIGURE 4. Percentage (f %) of teachers' responses on whether AI will not completely replace teachers in the future. The data in Figures 3 and 4 provide insight into the perspectives of English and German teachers regarding the role and potential of artificial intelligence (AI) in educational contexts. Regarding the potential capability of AI to deliver superior teaching under certain conditions, it is evident that teachers across both language groups display considerable scepticism. Overall, 68.2% of foreign language teachers surveyed disagree or strongly disagree with the assertion that AI could surpass human teachers in instructional effectiveness. Conversely, a quarter (25.4%) of respondents acr ss both groups recognise that, under specific circumstances, AI could ind ed outperform human teachers. While this highlights a cautious acknowledgement of AI’s instructi nal potential, the majority viewpoint clearly favours human teaching competences. When considering the possibility of complete replacement of teachers by AI in the future (Figure 4), there is an even stronger conse sus across both teacher groups. An ove whelming 85.5% of teachers reject the notion of full AI replacement of human teacher , und rscoring widespread confidence in the irreplaceability of human educators. Comparatively, these results align closely across both English and German te cher groups, illustrating a shared perception among a guage educators. Bo h groups exp ess strong reserva ions about AI fully replacing human instructors, yet both cautiously acknowledge AI’s supplementary role, contingent upon specific educational conditions. This comparative analysis underlines that perspectives concerning AI’s instructional role appear 2.2 28.3 43.5 21.7 4.3 2.5 25.0 32.5 35.0 5.0 0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0 Strongly agree Agree Disagree Strongly disagree Don't know English teachers German teachers 39.1 43.5 8.7 4.3 4.3 58.5 34.1 4.9 2.4 0.0 0.0 10.0 30.0 50.0 60.0 70.0 Strongly agree Agree Disagree Strongly disagree Don't know 20.0 English teachers 40.0 German teachers Figure 4. Percentage (f%) of teachers’ responses on whether AI will not completely replace teachers in the future. 125ENGLISH LANGUAGE AND LITERATURE TEACHING slightly more optimistic at 3.08 (SD = 1.19), t(82) = -1.03, p = .15. Overall, these findings indicate varied levels of acceptance and scepticism among teachers regarding the role of AI in education. We were also interested in exploring the correlations between beliefs about the potential of artificial intelligence to improve instruction in schools, the belief that artificial intelligence will not completely replace teachers in the future, and the perception of the effects of artificial intelligence on positive changes in student learning habits. Table 3. Means, standard deviations, reliabilities, and correlations of variables related to Perspectives on the Role of the Teacher and Its Impact on Student Learning Habits for English teachers. N M SD 1 2 1. AI could, under certain conditions, provide better instruction in schools than teachers (with natural intelligence). 40 3.15 0.95 2. AI will not completely replace teachers in the future. 41 1.51 0.71 .06 3. AI could potentially have a more positive than negative impact on student learning habits in the future. 41 3.10 1.17 .44** -.12 Note. The variables are measured on a scale from 1 to 4. Higher scores reflect a greater extent of the measured variable. *p<.05, **p<.01 Table 2 The t-test for differences in attitudes between teachers of English and German language on perspectives on the role of the teacher and its impact on student learning habits. Variables Numerus Mean St. Deviation Levene test t-test N M SD F P t (df) P AI could, under certain conditions, provide better instruction in schools than teachers (with natural intelligence). GEM 46 2.98 0.88 1.39 0.24 -0.91(80) 0.18 ENG 37 3.16 0.96 AI will not completely replace teachers in the future. GEM 46 1.91 1.03 1.17 0.28 2.25 0.01 ENG 38 1.47 0.69 AI could potentially have a more positive than negative impact on student learning habits in the future. GEM 46 2.83 1.06 0.45 0.51 -1.03 0.15 ENG 38 3.08 1.19 126 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ... Table 3 presents descriptive statistics and Pearson correlation coefficients among the three key variables for English teachers, reflecting their perspectives on AI’s role in education. A statistically significant positive correlation was found between the belief that AI could provide better instruction than teachers and the belief that AI could have a more positive than negative impact on student learning habits (r = .44, p = 0.004). This finding indicates that English teachers who perceive AI as potentially superior in instructional contexts are also likely to view its influence on student learning habits optimistically. Conversely, there was no significant correlation between the belief that AI will not completely replace teachers and the other two variables, suggesting that English teachers’ concerns about AI replacing teachers are independent of their views on the quality of AI instruction and its impact on student learning. Table 4. Means, standard deviations, reliabilities, and correlations of variables related to Perspectives on the Role of the Teacher and Its Impact on Student Learning Habits for German teachers. N M SD 1 2 1. AI could, under certain conditions, provide better instruction in schools than teachers (with natural intelligence). 46 2.98 0.88 2. AI will not completely replace teachers in the future. 46 1.91 1.02 .09 3. AI could potentially have a more positive than negative impact on student learning habits in the future. 46 2.83 1.06 .47** -.01 Note. The variables are measured on a scale from 1 to 4. Higher scores reflect a greater extent of the measured variable. *p<.05, **p<.01 Table 4 displays descriptive statistics and Pearson correlation coefficients among the three main variables for German teachers, exploring their views regarding AI’s potential in education. Similarly, as in the group of English teachers, results indicate significant, quite strong positive correlation between the belief in AI’s potential for providing better instruction and the belief that AI could positively affect student learning habits (r = .47, p = .000). This suggests that German teachers who have greater confidence in AI’s instructional capabilities also tend to be optimistic about AI’s beneficial effects on learning habits. However, no significant correlation emerged between the belief that AI will not completely replace teachers and the other measured variables (AI’s instructional quality and AI’s impact on learning habits). This implies that German teachers’ attitudes toward the likelihood of AI replacing human teachers are not associated with their perceptions of AI’s instructional effectiveness or its influence on student learning habits. 4.4.3 The Use of AI Among Students The questionnaire focused on teachers and their opinions on the use of AI, but in one question, teachers also reflected on what they knew about the use of AI by their students. 127ENGLISH LANGUAGE AND LITERATURE TEACHING The data in Figure 5 shows that the percentage of teachers who say they know that their students use AI (e.g., ChatGPT) in and for learning is extremely low at 8%. Slightly higher, but still low, is the percentage of teachers who say their students do not use AI in lessons (12%). The highest proportion believe that only some students use AI in and for lessons (44%), or many do not know (36%). As the teachers’ responses show, the use of AI by students is very open-ended and left up to individuals, their preferences and needs. How they use AI, for what purposes, or whether they use it critically and constructively enough, or only reproductively and problematically from the point of view of authorship and knowledge acquisition, are questions that will need to be answered in the future, and the systemic basis for doing so will also need to be prepared. Table 5. The t-test for differences between primary and upper secondary school teachers on whether their students use AI for learning (e.g., ChatGPT). Variable Numerus Mean St. Deviation Levene test t-test N M SD F P t (df) P Do your students use artificial intelligence (e.g. ChatGPT) for learning? Primary school teachers 47 3.06 0.87 0.16 0.69 -0.48(99) 0.31Upper secondary school teachers 54 3.15 0.88 The analysis of the data on the differences between secondary and primary school teachers’ knowledge of student use of AI showed that there were no statistically significant differences 8 12 36 44 Yes No Some Don't know Figure 5. Percentage (f%) of teachers’ responses on whether their students use AI for learning (e.g., ChatGPT). 128 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ... (t (99) = -0.48; p = .31). These findings are surprising, as we expected secondary school teachers to be more familiar with student use of AI than primary school teachers. Although the mean values show that secondary school teachers are slightly more familiar with it, the differences between them and primary school teachers are not significant. 5 Conclusion In this paper, starting from the case of humanoid robot teachers and the ubiquity of AI in our lives and schools, we discuss the conceptual framework of AI, including its concept, evolution and changes that are reflected in the field of education. This theoretical background was illuminated by empirical data on the perceptions of foreign language teachers (English and German) of AI in school, particularly in foreign language learning and teaching, and by empirical data on differences between perceptions of attitudes towards AI according to the teacher’s professional profile. AI, AI-powered tools, and humanoid robots are posing major challenges for schools, teachers, students, and decision-makers. Given their capabilities, their rapid growth, and the disruptive changes they bring, AI seems to have become a permanent part of the education landscape. In addition to the development of AI, there is an intense debate at the discursive level, such as definitions of AI, analyses of the developmental phases of AI, meta-studies on AI research (in schools), and several studies that address the technological, social, psychological, anthropological, and philosophical dimensions of the impact of AI on humans. Foreign language teaching has always been supported by various media, and AI is another one that is profoundly shaping and changing foreign language teaching. AI supports the user in solving linguistic and non-linguistic problems efficiently, quickly, and often too “elegantly.” We sought to shed empirical light on all these theoretical orientations, assumptions, and experiences with AI in school from the perspective of the direct actors, i.e., foreign language teachers of English and German. The results of the study, despite the limitations we have identified, provide an illustration of and orientation for further work and research. The findings reveal diverse perspectives among teachers regarding the role of AI in education. The majority believe that AI will significantly influence teaching in the future. German language teachers tend to express stronger opinions than English teachers, although both groups appear open to integrating AI in educational settings. Slightly more English teachers perceive AI as having potential, while a higher percentage of German teachers view it as a potential threat; however, these differences are not statistically significant. Despite some reservations, both groups demonstrate cautious optimism, viewing AI as a supportive tool rather than a replacement for human educators. The prevailing view is that AI will not replace teachers but can enhance teaching practices when implemented effectively. The study did not find substantial differences between English and German teachers in how they perceive AI’s potential to improve instruction. German teachers were more likely to believe AI could not fully replace human teaching, though this should be interpreted with caution, given the standard deviation observed in the data. Teachers’ understanding of students’ use of AI remains limited. Many are unsure whether students are using AI at all. There were no statistically significant differences between primary and secondary teachers regarding this awareness. Theory and empirical data support the view that 1) AI should be seen as an effective tool, as an assistant that can optimise foreign language learning and teaching where we have all perceived gaps, e.g., individualised learning, differentiation, motivation to learn by timely feedback, and 129ENGLISH LANGUAGE AND LITERATURE TEACHING above all support for the teacher in time-consuming, administrative tasks; and 2) that all the above theoretical background, research and empirical data (this research and Vodafone 2023) show that the role of the teacher in school, in foreign language learning, is stable, that AI currently does not pose a threat as a substitute teacher for either English or German. When considering AI in schools and foreign language teaching, we must acknowledge and address diametrically opposed yet legitimate perspectives from both theoretical and empirical standpoints. Chomsky warns against using AI, succinctly describing it as “sophisticated high-tech plagiarism” (YouTube Chomsky 2024). The slightly younger author Hartmann, an expert and researcher on AI in foreign language teaching, draws a parallel to the German Emperor Wilhelm II, who in the early days of the automobile was convinced that it was a passing phenomenon and believed in the horse (Hartmann 2023). Connecting these viewpoints, we can concur with Chomsky’s assessment of AI as sophisticated plagiarism. However, we must also recognise the validity of both Emperor Wilhelm’s scepticism about technological advancement and Hartmann’s assertion that AI’s disruptive influence on schools, learning, and foreign language instruction is here to stay. References Baskara, Risang, and Mukarto Mukarto. 2023. “Exploring the implications of ChatGPT for language learning in higher education.” Indonesian Journal of English Language Teaching and Applied Linguistics 7 (2): 343–58. https://ijeltal.org/index.php/ijeltal/article/view/1387. Bouras, Sana. 2024. “AI and the bad teacher dilemma.” Journal of Science and Knowledge Horizons 4 (1): 39–57. Bregant, Janez, Boris Aberšek, and Bojan Borstner. 2022. Contemporary Perspectives of Society: Artificial intelligence at the interface of science. Univerzitetna založba. Burow, Olaf-Axel. 2022. Schule der Zukunft: Sieben Handlungsoptionen. Schule leiten. Beltz. Chan, Cecilia Ka Yuk, and Louisa H. Y. Tsi. 2024. “Will generative AI replace teachers in higher education? A study of teacher and student perceptions.” Studies in Educational Evaluation 83: 101395. https://doi.org/10.1016/j.stueduc.2024.101395. Chomsky, Noam. 2024 “Noam Chomsky on artificial intelligence, ChatGPT.” Through Conversations Podcast. Video, 5 min., 37 sec. https://www.youtube.com/watch?v=_04Eus6sjV4 Clarke, Laurie. 2023. “Alarmed tech leaders call for AI research pause.” Science, April 11. https://www. science.org/content/article/alarmed-tech-leaders-call-ai-research-pause. Dargan, James. 2019. “Artificial intelligence: The angel of death for foreign language teachers.” Medium, April 29. https://chatbotslife.com/artificial-intelligence-the-angel-of-death-forforeign-language- teachers-cbff644a4967. De Florio-Hansen, Inez. 2020. Digitalisierung, Künstliche Intelligenz und Robotik: Eine Einführung für Schule und Unterricht. utb. Dolenc, Kosta, and Mihaela Brumen. 2024. “Exploring social and computer science students’ perceptions of AI integration in (foreign) language instruction.” Computers and Education: Artificial Intelligence 7: 1–13 https://doi.org/10.1016/j.caeai.2024.100285. Du, Jinming, and Ben Kei Daniel. 2024. “Transforming language education: A systematic review of AI- powered chatbots for English as a foreign language speaking practice.” Computers and Education. Artificial Intelligence 6: 100230. https://doi.org/10.1016/j.caeai.2024.100230. European Parliament. 2023. “What is artificial intelligence and how is it used?” Topics, European Parliament, June 20. https://www.europarl.europa.eu/topics/en/article/20200827STO85804/what-is- artificial-intelligence-and-how-is-it-used. Eurydice. 2021/2022. “Vzgoja in izobraževanje v Sloveniji.” https://eurydice.sio.si/publikacije/Vzgoja-in- izobrazevanje-v-RS-2021-22.pdf. 130 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ... Hartmann, Daniela. 2021. “Künstliche Intelligenz im DaF-Unterricht? Disruptive Technologien als Herausforderung und Chance.” Informationen Deutsch als Fremdsprache 48 (6): 683–96. https://doi.org/10.1515/infodaf-2021-0078. —. 2023. “Ersetzt die KI das Schreiben? ChatGPT & Co im DaF-Unterricht.” Cornelsen Fortbildungsveranstaltung, June 6, online from 16:00 to 16:45. Hong, Wilson Cheong Hin. 2023. “The impact of ChatGPT on foreign language teaching and learning: Opportunities in education and research.” Journal of Educational Technology and Innovation 5 (1): 37–45. Jazbec, Saša. 2024. “Umetna inteligenca oziroma orodja, podprta z umetno inteligenco, pri pouku in za pouk tujih jezikov: empirična raziskava o stališčih učiteljev tujega jezika v Sloveniji.” Ars & Humanitas 18 (1): 115–30. https://doi.org/10.4312/ars.18.1.115-130. Kačič, Zdravko. 2024. “Kako inteligentna je umetna inteligenca?” Delo, Sobotna priloga, January 13. https://www.delo.si/sobotna-priloga/kako-inteligentna-je-umetna-inteligenca. Kannan, Jaya, and Pilar Munday. 2018. “New trends in second language learning and teaching through the lens of ICT, networked learning, and artificial intelligence.” Círculo de Lingüística 150 Aplicada a la Comunicación 76: 13–30. https://doi.org/10.5209/CLAC.62495. Kartal, Galip. 2023. “Contemporary language teaching and learning with ChatGPT.” Contemporary Research in Language and Linguistics 1 (1): 59–70. https://doi.org/10.62601/crll.v1i1.10 Kasneci, Enkelejda, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, Stephan Krusche, Gitta Kutyniok, Tilman Michaeli, Claudia Nerdel, Jürgen Pfeffer, Oleksandra Poquet, Michael Sailer, Albrecht Schmidt, Tina Seidel, Matthias Stadler, Jochen Weller, Jochen Kuhn, and Gjergji Kasneci. 2023. “ChatGPT for good? On opportunities and challenges of large language models for education.” Learning and Individual Differences 103: 102274. https://doi.org/10.1016/j.lindif.2023.102274. Knaus, Thomas. 2024. “Künstliche Intelligenz und Pädagogik – ein Plädoyer für eine Perspektiverweiterung.” Ludwigsburger Beiträge zur Medienpädagogik – LBzM 24: 1–34. https://doi.org/10.21240/lbzm/24/11. Licardo, Marta, and Alenka Lipovec, eds. 2024. Artificial Intelligence Literacy and Social-emotional Skills as Transversal Competencies in Education. Verlag Dr. Kovač. Marr, Bernard. 2018. “The key definitions of artificial intelligence (AI) that explain its importance.” Forbes, February 14. https://www.forbes.com/sites/bernardmarr/2018/02/14/the-key-definitions-of-artificial- intelligence-ai-that-explain-its-importance/. Miao, Fengchun, Wayne Holmes, Huang Ronghuai, and Hui Zhang. 2021. AI and Education: Guidance for policymakers. UNESCO Publishing. OECD. AI Policy Observatory. n.d. “OECD AI Principles overview.” Archived July 2, 2023. https://oecd.ai/en/ai-principles. Pettersson, Jenny, Elias Hult, Tim Eriksson, and Tosin Adewumi. 2024. “Generative AI and teachers – for us or against us? A case study.” arXiv:2404.03486. https://doi.org/10.48550/arXiv.2404.03486. Ramge, Thomas. 2018. Mensch und Maschine. Wie künstliche Intelligenz und Roboter unser Leben verändern. Reclam. Renz, André, Swathi Krishnaraja, and Elisa Gronau. 2020. “Demystification of artificial intelligence in education. How much AI is really in the educational technology?” International Journal of Learning Analytics and Artificial Intelligence for Education 2 (1): 14–30. https:// doi.org/10.3991/ijai. v2i1.12675. Rubanau, Ihar 2024. “Artificial intelligent seasons.” IIoT World, June 21. https://www.iiot-world.com/ artificial-intelligence-ml/artificial-intelligence/artificial-intelligent-seasons/. Strasser, Thomas. 2020. “Künstliche Intelligenz im Sprachunterricht. Ein Überblick.” Revista Lengua y Thorp, H. Holden. 2023. “ChatGPT is fun, but not an author.” Science 379 (6630): 313. https://doi.org/10.1126/science.adg7879. Cultura. Biannual Publication 1 (2): 1–6. https://dialnet. unirioja.es/servlet/articulo?codigo=9114327. Tica, Lena, and Ivana Krsmanović. 2024. “Overcoming the writer’s block? Exploring students’ motivation and perspectives on using ChatGPT as a writing assistance tool in ESP.” ELOPE: English Language Overseas Perspectives and Enquiries 21 (1): 129–49. https://doi.org/10.4312/elope.21.1.129-149. 131ENGLISH LANGUAGE AND LITERATURE TEACHING UM:NIK. 2025. https://plus.cobiss.net/cobiss/um/sl/bib/search. Vodafone. 2023. “Aufbruch ins Unbekannte.” Vodafone Stiftung, April 20. https://www.vodafone-stiftung. de/ki-an-schulen/. Wong, K. Gary, Xiaojuan Ma, Pierre Dillenbourg, and John Huan. 2020. “Broadening artificial intelligence education in K-12: Where to start?” ACM Inroads 11 (1): 20–29. https://doi.org/10.1145/3381884. Yuan, Yijia. 2023. “An empirical study of the efficacy of AI chatbots for English as a foreign language learning in primary education.” Interactive Learning Environments 32 (10): 6774–89. https://doi.org/10.1080/10494820.2023.2282112. Žerovnik, Alenka, and Matej Zapušek. 2024. Uporaba generativne umetne inteligence v izobraževanju. Založba UL Pedagoške fakultete. https://zalozba.pef.uni-lj.si/index.php/zalozba/catalog/book/226. 133ENGLISH LANGUAGE AND LITERATURE TEACHING Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI in Classroom Settings ABSTRACT The use of artificial intelligence (AI) in language learning has rapidly increased with the widespread popularity of generative AI tools such as ChatGPT. Research highlights the need for school-age learners to develop digital literacy skills to engage critically and responsibly with AI-based tools. Equally important is the role of (language) teachers, who must possess the skills necessary to guide students in navigating and leveraging this technology effectively. This exploratory study investigates the extent of EFL teachers’ knowledge and their attitudes toward using AI tools for language learning. Focusing on primary and secondary school EFL teachers in Croatia, the study aims to shed light on their perspectives on and preparedness for integration of AI into the language classroom, addressing a critical aspect of modern education and contributing to a deeper understanding of what educators need to successfully incorporate AI into their teaching. Keywords: artificial intelligence (AI), teacher attitudes, EFL, digital competence, teaching methods, primary school, secondary school Pogled osnovno- in srednješolskih učiteljev in učiteljic angleščine kot tujega jezika na Hrvaškem na uporabo UI pri pouku IZVLEČEK Uporaba umetne inteligence (UI) pri učenju jezikov je močno narasla z razmahom generativnih orodij UI, kot je ChatGPT. Raziskave poudarjajo potrebo po digitalni pismenosti učencev in učenk za kritično in odgovorno uporabo orodij UI. Prav tako je pri tem ključna vloga učiteljev in učiteljic (jezikov), ki morajo imeti ustrezna znanja za učinkovito usmerjanje učečih se pri uporabi te tehnologije. Ta študija ugotavlja raven znanja in stališča učiteljev in učiteljic angleščine kot tujega jezika (EFL) do uporabe orodij UI pri jezikovnem pouku. Raziskava osvetljuje stališča in pripravljenost učiteljev in učiteljic osnovnih in srednjih šol na Hrvaškem za vključevanje UI v jezikovni pouk ter prispeva k razumevanju njihovih potreb za uspešno integracijo UI v učni proces. Ključne besede: umetna inteligenca (UI), stališča učiteljev in učiteljic, angleščina kot tuji jezik, digitalne kompetence, učne metode, osnovna šola, srednja šola 2025, Vol. 22 (1), 133-150(228) journals.uni-lj.si/elope https://doi.org/10.4312/elope.22.1.133-150 UDC: [811.111’243:37.091.3(497.5)]:004.8 Bojan Prosenjak University of Zagreb, Croatia Eva Jakupčević University of Split, Croatia 134 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ... 1 Introduction AI-based tools have gained traction in all areas of life in the last few years, with education and language learning being no exceptions. The ever-changing digital landscape of the 21st century has brought about a shift, necessitating a redefinition of both the roles of stakeholders as well as methodologies in education (Aghaziarati, Nejatifar, and Abedi 2023, 35). It has become crucial for teachers to possess knowledge and understanding to effectively assess and implement these tools in their classes (Luckin et al. 2022, 2). Research has highlighted the importance of developing digital literacy skills in school-age learners to ensure their critical and responsible use of AI-based tools, which means that teachers need to have skills adequate to support and guide students in their exploration of this technology (Gisbert Cervera and Caena 2023; Javier and Moorhouse 2023). However, studies have indicated that teachers might not possess these skills and may even harbour negative perceptions related to AI (Kohnke, Moorhouse, and Zou 2023b; Nazaretsky et al. 2022, 916). AI-powered tools offer multiple opportunities for language learning and teaching, which may be particularly advantageous in EFL contexts, where students usually have limited access to authentic language use outside the classroom. Among other benefits, AI-based applications can provide learning feedback, help with translation, aid teachers in creating activities and scenarios for language learning, and support language assessment (Creely 2024, 158). However, despite the increasingly recognised potential of AI for language learning, its effective integration in education depends on teacher attitudes towards the technology, which influence both their methods and student outcomes (Yue, Jong, and Ng 2024, 19510). Therefore, investigating teachers’ perceptions of AI tool integration in EFL education across diverse settings is both pertinent and timely. Notably, there is a lack of research on this topic in the context of Croatian primary and secondary schools, a gap that the present study seeks to address. 2 Theoretical Background As AI becomes more widespread in education, there has been a growing body of research examining the advantages and potential drawbacks of using AI-driven tools in language learning (e.g., Chiu et al. 2023; Creely 2024; Javier and Moorhouse 2023; Kohnke, Moorhouse, and Zou 2023b; Rebolledo Font De la Vall and González Araya 2023). AI- powered resources for language learning may include language tutoring systems that give personalised feedback; generative AI tools that can generate text or activities adapted to specific levels or groups of students; text-to-speech software; image creation software, etc. (Rebolledo Font De la Vall and González Araya 2023, 7569). Studies indicate that there is potential in these tools for improving language learning results (Liang et al. 2023). For example, AI-powered tools can offer a space for interaction for EFL learners, who often have limited opportunities to use the target language in their everyday lives, and chatbots have been found effective in improving students’ oral proficiency in English as well as their willingness to communicate in the L2 (Timpe-Laughlin, Sydorenko, and Daurio 2022; Yuan 2023). Personalisation has been seen as another benefit of AI for language education, with AI-based assessment and feedback tools potentially offering a more adaptive and Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ... 135ENGLISH LANGUAGE AND LITERATURE TEACHING targeted learning experience (Yesilyurt 2023). In addition to these benefits for learners, AI can support teachers in enhancing their teaching capabilities, developing adaptive strategies, and advancing their professional development (Chiu et al. 2023). On the other hand, numerous challenges associated with the use of AI, both in education broadly and in language learning specifically, have been identified, such as ethical concerns, for example, those related to authorship of content generated by AI, or the lack of transparency in its use (Creely 2024). Data privacy and bias have also frequently been mentioned in relation to the copious amounts of data analysed by AI-based systems (Kohnke, Moorhouse, and Zou 2023b). Other concerns include potential overreliance on technology in assessment, which could result in the loss of “nuanced, empathetic feedback and the vital interpersonal connection between educators and learners” (Yesilyurt 2023, 33). Moreover, relying on AI tools to track learning progress could influence students’ capacity to build self-regulation skills, which are essential for lifelong learning (Molenaar 2022). The appropriate implementation of AI-based tools will therefore require language teachers to possess a specific set of skills and knowledge about both the potential advantages and limitations of AI (Kohnke, Moorhouse, and Zou 2023a), as the teachers’ lack of knowledge can hinder students’ development of digital competence (Nascimbeni and Vosloo 2019). The importance of the teachers’ perspectives is further emphasised in a study by Polak, Schiavo and Zancanaro (2022), which involved teachers, school psychologists, and education managers from schools across four European countries. The study found that a strong willingness to learn about AI and incorporate digital tools into teaching is crucial for the effective integration of AI in education. While some studies mention negative attitudes towards AI among teachers, with reports of anxiety and concerns about the future (Chiu et al. 2023, 12), the majority indicate that teachers recognise both the advantages and limitations of using AI in education. For instance, teachers (N = 28) from diverse backgrounds in a study by Aghaziarati, Nejatifar and Abedi (2023, 39), acknowledged AI’s potential to improve individualised learning and encourage innovative teaching methods, but they also raised issues regarding ethics as well as the need for infrastructure and ongoing training for teachers. Similarly, a small-scale study of twelve university language instructors in Hong Kong by Kohnke, Moorhouse and Zou (2023b) found that while participants were overall optimistic about the potential of AI-powered tools in language education, they lacked confidence in teaching students to use these tools productively and responsibly. They also voiced concerns regarding the ethical and practical difficulties linked to the adoption of AI technologies. Comparable results have been reported in other studies, where teachers view AI positively but regularly acknowledge their limited knowledge of its practical applications (Chounta et al. 2022; Galindo-Domínguez et al. 2024b; Polak, Schiavo, and Zancanaro 2022; Sütçü and Sütçü 2023). These findings underscore the widespread need for systematic teacher training, which should start at the pre-service level. For instance, while expressing optimism about the effect of AI on education and teaching and learning EFL, Slovak pre-service English language teachers (N = 137) reported having no (61.31%) or limited (21.17%) knowledge of the fundamental principles of AI (Pokrivcakova 2023). Only 35.04% of the participants 136 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ... believed their understanding of AI-based tools for EFL teaching was sufficient, whereas most (64.24%) supported the idea of including AI education in their university studies. Research also reveals that an insufficient understanding of how AI technologies operate may limit the teachers’ ability to fully integrate them into learning, teaching, and assessment (Chiu et al. 2023, 12). For example, a study by Chounta et al. (2022) validated the perceived knowledge of 131 Estonian primary and secondary teachers about AI, using a questionnaire with statements related to the technology. Most participants (57%) answered 60% of the questions correctly, and fewer correct answers were provided by teachers who considered themselves more knowledgeable, suggesting that misconceptions may be a hindrance to the implementation of AI. This is further supported by research conducted by Galindo- Domínguez et al. (2024b) on 445 Spanish teachers from diverse backgrounds, which found a positive relationship between teachers’ digital competence and their perceptions of AI. According to this study, teachers with greater digital competence tend to experience fewer difficulties when using educational technology, which in turn fosters a more positive attitude towards its integration. These findings once again accentuate the importance of providing teachers with training in both general digital competences and AI-related issues to enhance their confidence and effectiveness in using these tools. However, it must be emphasised that a low level of digital competence can be offset by a high level of motivation to learn about AI tools, which has been found to be among the key factors for incorporating AI into education (Polak, Schiavo, and Zancanaro 2022). Many teachers across different educational contexts express uncertainty about how to handle AI-related issues in education, underscoring the need for further research to determine the specific support required to help them navigate these challenges. While numerous studies have explored teachers’ attitudes towards AI in education in general, there is a scarcity of research focusing specifically on foreign language teachers working with primary and secondary school learners. As far as we are aware, no studies of this kind have been carried out in the Croatian context. This gap in the literature highlights the importance of the present study, which aims to explore the attitudes of primary and secondary school EFL teachers in Croatia towards AI use in education, as well as their perceptions of the potential benefits and challenges that this technology presents in the field of foreign language teaching and learning. 3 The Present Study 3.1 Aim and Research Questions The aim of the present study was to investigate the attitudes of primary and secondary school EFL teachers in Croatia towards the use of AI in language teaching and learning, as well as to explore their perspectives on the possible advantages and disadvantages brought by the implementation of this technology in the EFL classroom. To this end, the following research questions have been formulated: RQ1: What are the attitudes of primary and secondary school EFL teachers in Croatia towards the use of AI in EFL teaching and learning? 137ENGLISH LANGUAGE AND LITERATURE TEACHING RQ2: To what extent do the teachers’ age, length of teaching experience, and type of school where they work influence their attitudes towards the use of AI in EFL teaching? RQ3: What are the perspectives of EFL teachers in Croatia on the potential advantages and disadvantages of incorporating AI in their EFL teaching practice? 3.2 Study Context Students in Croatian schools are required to begin studying a foreign language, typically English, from the first grade (pupils aged 6/7). EFL is taught through two weekly lessons in lower primary (1st to 4th grade, seventy lessons per year), three weekly lessons in upper primary (5th to 8th grade, 105 lessons per year), while the number of weekly lessons in secondary school depends on the type of school. To teach EFL in a Croatian school, teachers must hold an MA in EFL teaching or primary education with a specialisation in teaching English to primary-aged children. In recent years, there has been a growing recognition of the importance of artificial intelligence (AI) in education in Croatia. Notably, a handbook on AI in education has been published by the Agency for Electronic Media and UNICEF (2024), and curricula for elective subjects on AI for both primary and secondary schools have been developed by CARNET – the Croatian Academic and Research Network (2024a; 2024b). Additionally, a variety of webinars and resources have been made available to support teachers in integrating AI into their classrooms. However, to our knowledge, there is currently no systematic education about AI in the context of English language teaching or teacher education programs in Croatia. 3.3 Participants The participants in our study were sixty-three primary and secondary school EFL teachers from across Croatia, five of whom were male and the rest female. Most of the participants were between 41 and 50 years old (Figure 1), of whom almost half worked in primary and half in secondary school (Figure 2). 3.2 Study Context Students in Croatian schools are required to begin studying a foreign language, typically English, from the first grade. (pupils aged 6/7). EFL is taught through two weekly lessons in lower primary (1st to 4th grade, seventy lessons per year), three weekly lessons in upper primary (5th to 8th grade, 105 lessons per year), while the number of weekly lessons in secondary school depends on the type of school. To teach EFL in a Croatian school, teachers must hold an MA in EFL teaching or a primary education with a specialisation in teaching English to primary- aged children. In recent years, there has been a growing recognition of the importance of artificial intelligence (AI) in education in Croatia. Notably, a handbook on AI in education has been published by the Agency for Electronic Media and UNICEF (2024), and curricula for elective subjects on AI for both primary and secondary schools have been developed by CARNET – the Croatian Academic and Research Network (2024a; 2024b). Additionally, a variety of webinars and resources have been made available to support teachers in integrating AI into their classrooms. However, to our knowledge, there is currently no systematic education about AI in the context of English language teaching or teacher education programs in Croatia. 3.3 Participants The participants in our study were sixty-three primary and s cond r school EFL tea ers from across Croatia, five of whom were male and the rest female. Most of the participan s were between 41 and 50 years old (Figure 1), of whom almost half w rked in primary an half in secondary school (Figur 2). FIGURE 1. Participants by age group (N = 63). Figure 1. Participants by age group (N = 63). 138 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ... FIGURE 2. Participants by type of school of employment (N = 63). Almost half the participants had been working in school for more than 20 years, and over a third for between 10 and 20 years (Figure 3). FIGURE 3. Participants by length of teaching experience (N = 63). 3.4 Instruments For the purposes of this study, two questionnaires were used. Questionnaire 1 was developed and validated in a study by Galindo-Domínguez et al. (2024b) with the aim of analysing teachers’ attitudes towards the use of AI in education (Appendix 1): the final scale consisted of 25 items divided into four factors or dimensions: willingness to use AI (items 1-3), attitude towards AI (items 4-10), professional expectations towards AI (items 11-20), and personal experiences with AI (items 21-25). Questionnaire 1 required the participants to select one of the five values on a Likert scale for each of the twenty-five items, indicating the degree to which they agreed with the given statement. The higher the value for each item, the stronger the participants’ agreement with the statement, reflecting a more positive attitude of EFL teachers towards AI use in education. For items Q11 and Q21, the values were recoded before the analysis. Questionnaire 2 was designed by the authors of the present study, drawing on relevant literature and prior studies in the domain of teacher attitudes towards AI (Aghaziarati, Nejatifar, and Abedi 2023; Chounta et al. 2022; Kohnke, Moorhouse, and Zou 2023b). Its purpose was to gather responses that would support the qualitative analysis of the results and answer the Figure 2. Participants by type of school of employment (N = 63). Almost half the participants had been working in school for more than 20 years, and over a third for between 10 and 20 years (Figure 3). FIGURE 2. Participants by type of school of employment (N = 63). Almost half the participants ad been working in school for more than 20 years, and over a third for between 10 and 20 years (Figure 3). FIGURE 3. Participants by length of teaching experience (N = 63). 3.4 Instruments For the purposes of this study, two questionnaires were used. Questionnaire 1 was developed and validated in a study by Galindo-Domínguez et al. (2024b) with the aim of analysing teachers’ attitudes towards the use of AI in education (Appendix 1): the final scale consisted of 25 items divided into four factors or dimensions: willingness to use AI (items 1-3), attitude towards AI (items 4-10), professional expectations towards AI (items 11-20), and personal experiences with AI (items 21-25). Questionnaire 1 required the participants to select one of the five values on a Likert scale for each of the twenty-five items, indicating the degree to which they agreed with the given statement. The higher the value for each item, the stronger the participants’ agreement with the statement, reflecting a more positive attitude of EFL teachers towards AI use in education. For items Q11 and Q21, the values were recoded before the analysis. Questionnaire 2 was designed by the authors of the present study, drawing on relevant literature and prior studies in the domain of teacher attitudes towards AI (Aghaziarati, Nejatifar, and Abedi 2023; Chounta et al. 2022; Kohnke, Moorhouse, and Zou 2023b). Its purpose was to gather responses that would support the qualitative analysis of the results and answer the Figure 3. Participants by length of teaching experience (N = 63). 3.4 Instruments For the purposes of this study, two questionnaires were used. Questionnaire 1 was developed and valida e in a study by Galindo-Domínguez et al. (2024b) with the aim of analysing teachers’ attitudes towards the use of AI in education (Appendix 1): the final scale consisted of 25 items divided into four factors or dimensions: willingness to use AI (items 1-3), attitude towards AI (items 4-10), professional expectations towards AI (items 11-20), a d personal experience with AI (items 21-25). Questionnaire 1 required the participants to select one of the fiv values on a Likert scale for each of th twenty-five items, indicating the degree to which they agreed with the given statement. The higher the value for each item, the stronger the participants’ agreement with the statement, reflecting a more positive attitude of EFL teachers towards AI use in education. For items Q11 and Q21, the values were recoded before the analysis. Questionnaire 2 was designed by the authors of the present study, drawing on relevant literature and prior studies in the domain of teacher attitudes towards AI (Aghaziarati, Nejatifar, and Abedi 2023; Chounta et al. 2022; Kohnke, Moorhouse, and Zou 2023b). Its 139ENGLISH LANGUAGE AND LITERATURE TEACHING purpose was to gather responses that would support the qualitative analysis of the results and answer the research questions posed in this study. It included eight open-ended questions which the participants were invited to answer but that remained optional (Appendix 2). 3.5 Data Collection and Analysis The data for this study was collected in the winter semester of the school year 2024/2025. The questionnaires were sent to teachers across Croatia by email and were posted on Facebook groups for primary and secondary school EFL teachers in Croatia. The introductory section outlined the goal of the study and made it clear that their responses would help in achieving this goal. Next, it was stated that they would be taking part in the study on a voluntary basis and anonymously, and they had the option to withdraw from participation at any moment. Apart from the two questionnaires, the participants’ demographic data was also collected – their profession, gender, age, type of school where employed, and the length of their teaching experience. To address RQ1, the participants’ responses from Questionnaire 1 were analysed quantitatively using descriptive statistics. The responses were also grouped into four factors, as identified by Galindo-Domínguez et al. (2024b), and the mean values for each factor were calculated. Next, a multiple linear regression analysis was performed to answer RQ2. This analysis examined the relationship between the participants’ average questionnaire scores and the three predictors: age, type of school, and length of teaching experience. Questionnaire 2 was analysed qualitatively, using a thematic analysis approach. The two authors independently reviewed the teachers’ responses and identified recurring themes based on frequency and relevance. This was followed by a collaborative process in which findings were compared and discussed, reaching a consensus on the key themes that became evident from the data. This iterative process ensured that the analysis was thorough and reflective of the teachers’ perspectives. Focusing on the themes most frequently mentioned ensured that the key insights were captured while maintaining the richness of the data. 3.6 Results and Discussion To determine the attitudes of EFL teachers in Croatia towards the use of AI in EFL teaching and learning (RQ1), the participants’ responses in Questionnaire 1 were analysed. After categorizing the responses into four factors as identified by Galindo-Domínguez et al. (2024b), we found all four mean values to be above average, although varying across the factors (Table 1). The findings indicate that most participants in the present study are willing to use AI in their classes (Factor 1) and have a positive attitude towards it (Factor 2). However, fewer teachers have positive professional expectations regarding AI (Factor 3), and an even smaller number have had positive firsthand experiences with it (Factor 4). This pattern mirrors the results obtained by Galindo-Domínguez et al. (2024b) for the four factors, with Factor 1 rated the highest, and Factor 4 the lowest by the Spanish teachers in their study. However, unlike our participants, teachers in Spain exhibited only moderately high or neutral values for Factors 1, 2 and 3 (3.73, 3.60, and 3.33 respectively) and a 140 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ... moderately low value for Factor 4 (2.24). In other words, the participants in our study expressed greater willingness to use AI in their classes and more positive attitudes towards the technology compared to the Spanish teachers. While they reported lower values regarding their professional expectations of the technology, these values were still higher than those of their counterparts in Galindo-Domínguez’s study. Despite this difference, the results highlight the need for providing teachers with training and examples of effective practices for using AI in EFL classrooms, which will be discussed in more detail at a later point. The results for individual questionnaire items (Table 2) provide more detailed insight into the teachers’ attitudes. The highest mean value was recorded for statement 4 (“I am interested in learning about artificial intelligence in education.”), followed by statement 5 (“I am interested in exploring the use of artificial intelligence as a complementary tool for my teaching practice.”) and statement 3 (“I would love to be able to use artificial intelligence in my work as a teacher.”). These results indicate that most teachers in our study are eager to learn about how AI can be implemented in their lessons and how they can use it in class. On the other hand, statement 23 showed the lowest mean value (“I have extensive experience with the use of artificial intelligence in education.”), followed by statements 24 (“I can share my knowledge and skills about artificial intelligence with other teachers.”), and 13 (“Artificial intelligence will positively revolutionise education.”). These findings suggest that many teachers lack sufficient experience with AI in their classrooms, limiting their ability to share their expertise with colleagues. Additionally, a sizeable number of participants expressed scepticism about AI’s potential to revolutionise education. The results of our quantitative analysis are largely in line with previous studies, where teachers reported generally positive attitudes towards the use of AI in education but also noted a lack of the skills necessary to implement AI tools successfully in their teaching (e.g., Chounta et al. 2022; Galindo-Domínguez et al. 2024a, 20204b; Pokrivcakova 2023; Polak, Schiavo, and Zancanaro 2022; Sütçü and Sütçü 2023). However, as opposed to teachers in some studies, for example, the pre-service teachers in Pokrivcakova (2023), these participants were not entirely optimistic about AI’s potential to revolutionise education. Their cautiously positive stance is further elaborated on in their responses to Questionnaire 2 and reflected in other studies, such as Sütçü and Sütçü (2023), where Turkish EFL teachers of university preparatory classes also showed awareness of the advantages of AI along with concerns regarding its potential adverse effects on education. The teachers in the study by Kohnke, Moorhouse and Zou (2023b) also expressed caution and emphasised the overall lack of structured training and consistent information from their institution. These results further underscore the importance of teacher training, not only in AI-based tools but also in general Table 1. Mean values of participants’ answers to Questionnaire 1 per factor averages (N = 63). Factor Minimum Maximum M SD 1 1.00 5.00 4.17 1.028 2 1.71 5.00 4.09 0.895 3 1.60 5.00 3.71 0.831 4 1.00 4.80 3.27 1.030 141ENGLISH LANGUAGE AND LITERATURE TEACHING digital competences, as highlighted by Kohnke, Moorhouse and Zou (2023a), since studies have shown that attitudes towards and implementation of AI in the classroom depend on the teachers’ confidence in using such tools (Galindo-Domínguez 2024a). The results of the regression analysis conducted to examine the extent to which the teachers’ age, length of teaching experience, and type of school where they work influence their attitudes towards AI use in EFL teaching (RQ2) indicate that the overall model was not statistically significant (F(3,59) = 1.229, p < .307). This suggests that the type of school, teachers’ age, and the length of their teaching experience do not significantly explain the variance in the average teacher score. Therefore, the answers given in the questionnaires by all teachers who took part in the present study could be treated equally, regardless of how old they were or how long they had been working in either a primary or secondary school. These results reflect those from other studies. For example, in Galindo-Domínguez et al. Table 2. Mean values of participants’ answers to Questionnaire 1 (N = 63). Question Minimum Maximum M SD Q1 1 5 4.11 1.049 Q2 1 5 4.13 1.114 Q3 1 5 4.27 1.050 Q4 1 5 4.44 0.894 Q5 1 5 4.35 1.003 Q6 1 5 3.65 1.246 Q7 1 5 4.02 1.085 Q8 2 5 4.17 0.959 Q9 1 5 3.87 1.085 Q10 1 5 4.10 0.979 Q11 1 5 4.03 1.047 Q12 1 5 3.17 1.100 Q13 1 5 3.11 1.094 Q14 1 5 3.67 1.092 Q15 1 5 3.86 0.913 Q16 1 5 4.00 1.000 Q17 1 5 3.87 0.959 Q18 2 5 4.13 0.852 Q19 1 5 3.37 1.005 Q20 1 5 3.87 0.924 Q21 1 5 4.06 1.268 Q22 1 5 3.52 1.105 Q23 1 5 2.54 1.162 Q24 1 5 2.70 1.328 Q25 1 5 3.54 1.330 142 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ... (2024a, 2024b), teachers’ attitudes towards AI did not differ based on the educational stage in which they were employed. However, when it comes to technology acceptance in general, results were more varied. Some studies have pointed out that younger teachers are more open to incorporating technology in their practice (e. g., O’Bannon and Thomas 2014; Joseph, Thomas, and Nero 2021), but this trend was not observed in our sample. Joseph, Thomas and Nero (2021) also found that more experienced teachers tended to use less technology in the classroom, a finding that contrasts with other studies, such as that conducted by Gu, Zhu and Guo (2013), who found novice teachers to be less reliant on technology. In other words, it appears that studies to date have found few consistent patterns regarding the influence of personal and sociodemographic factors on teachers’ attitudes towards technology, which is confirmed by our results. Following the presentation of the quantitative findings, the teachers’ answers to the open questions in Questionnaire 2 (Appendix 2) were analysed thematically to provide more insight into the perspectives of EFL teachers in Croatia on the potential advantages and disadvantages of incorporating AI in their EFL teaching practice (RQ3). The thematic analysis of responses from Questionnaire 2 revealed three main categories of benefits: those associated with lesson planning and material design, those primarily benefiting students, and those related to assessment. Of thirty-six participants who answered the question about the potential of implementing AI in EFL teaching and learning, twenty mentioned its role in lesson planning and/or designing materials, with six highlighting timesaving as a key benefit. The terms ‘personalisation’ and ‘adaptation’ frequently appeared, referring to tailoring tasks, content, and programs to individual learners’ needs and interests, including those in mixed-ability classes and learners with special educational needs. This type of differentiation was also linked to increased student motivation and engagement, which corresponds to findings by Brinegar (2023). Six participants specifically discussed adapting materials in relation to the existing EFL curriculum, noting that AI enables the creation of multiple curriculum versions that can be improved, tailored, or individualised for specific students. In addition, AI was considered a fast way to generate creative ideas and a useful tool for generating tasks such as reading comprehension exercises (e.g., using ChatGPT), materials accompanying videos or audio recordings, interactive and visually engaging presentations (e.g., Canva), generating images and flashcards (e.g., from YouTube), creating dialogue scenarios, discussion questions, grammar tasks, quizzes, and gamified activities. These benefits of AI for lesson planning and creativity have also been recognised in previous studies (e.g., Chounta et al. 2022; Sütçü and Sütçü 2023). Regarding the second category of benefits, fifteen participants highlighted the advantages of AI primarily for students, particularly in providing additional help or guidance in learning. They noted that engaging, interactive, up-to-date, and diverse activities created with the help of AI can foster active student participation, resulting in more dynamic classes where students can develop unexpected solutions, which in turn enables exchanging experiences and networking among students. Some examples included AI facilitating faster and easier access to information, encouraging independent learning, boosting motivation, and offering instant feedback. Participants also highlighted AI’s potential to support speaking 143ENGLISH LANGUAGE AND LITERATURE TEACHING and pronunciation practice through voice recognition software, to develop critical thinking by enabling students to create their own content, and to teach them to craft written and spoken prompts. Finally, a participant mentioned AI’s potential for creating chatbots to tutor students, support students with learning difficulties, and provide additional assistance through assistive technology features. The final category of benefits identified by participants was assessment, with six participants highlighting it as having the greatest potential for AI integration in their lessons. They provided examples such as using AI to create adapted tasks, to design reading and listening comprehension tests for specific topics or vocations, and to develop formative and summative assessment rubrics (e.g., by using tools such as MagicSchool). AI was also seen as a way to make test correction faster, easier, and more precise (e.g., uploading existing rubrics). Additionally, participants mentioned using AI to generate revision quizzes (e.g., via Kahoot and Quizlet), and two participants specifically highlighted its utility for writing descriptive grades and providing feedback for parents. The three groups of benefits identified in the participants’ answers align closely with those discussed in the current research on AI in education. For instance, in a recent systematic literature review of ninety-two articles, Chiu et al. (2023) identified four key domains where AI is beneficial: AI in student learning, AI in teaching, AI in administration, and AI for assessment. These domains mirror the categories found in our participants’ responses, suggesting that teachers are familiar with the major advantages AI can offer. The parallels between the findings in this study and the broader literature indicate that educators are not only aware of AI’s potential but also able to recognise its relevance across various aspects of their professional practice. However, it must be noted that most teachers focused on a limited set of benefits, primarily those relating to lesson planning and designing learning materials. This is unsurprising, since such time-saving methods are likely to resonate with teachers who juggle multiple tasks, including administrative duties. Additionally, these uses of generative AI are likely the most accessible and straightforward, while fewer teachers might recognise the more specific benefits, possibly those who have had additional training. Analysis of the participants’ responses also highlighted several concerns and challenges, which can be grouped into four categories, the first of which encompasses concerns about misuse and ethical issues. Unsurprisingly, many of the participants (21/41) focused on issues of plagiarism and cheating, mentioning the example of ChatGPT being used by students to draft essays and create presentations. Teachers also noted students’ misuse of AI for schoolwork, homework, and tests. Two participants even pointed out that teachers could be complicit in plagiarism by presenting AI-generated materials as their own. The second category centres on the potential dangers of overreliance on AI, or of using AI tools “without thinking,” particularly regarding its impact on critical thinking and creativity (7 participants). Three participants expressed concern that students’ need to invest little effort and time in some tasks nowadays could erode key skills, for example, research skills, problem- solving, critical thinking, summarising, writing essays, creating presentations, spelling, and translating. Some teachers noted that the ease of AI-generated tasks might create a false sense of achievement, further diminishing motivation and creativity. In this context, several of the 144 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ... teachers commented that AI should be applied “in the right measure.” While many teachers acknowledged AI as a potential tool for improving motivation, several expressed concern that overreliance on AI tools would reduce students’ overall motivation and creativity. One teacher even stated that reliance on AI would bring about a reduction in people’s intelligence because “we will not bother to do things anymore,” and another participant said the following: “I do not like the direction we’re heading in. We are losing our humanity.” The third category addresses the limitations and risks of AI tools, as well as concerns about digital literacy. Five participants mentioned the limitations of AI, such as age restrictions, bias, and the inability to always interpret context or provide accurate feedback, which can result in student frustration and even data privacy infringement. Several teachers (5/41) emphasised the need to use reliable AI tools, especially when doing online tasks and tests, and to educate students on what happens to their data online. A few participants noted that the materials generated by AI needed adaptation, as they can be an excellent starting point but still require careful review before use in class. Furthermore, four participants raised concerns about low digital literacy among both students and teachers, stressing the necessity for better education and preparation to use AI effectively in the classroom. They also highlighted the considerable time needed to teach students how to learn using AI tools, with teachers often lacking training themselves. Additionally, two participants underscored the absence of instruction on the ethical considerations regarding the use of AI. The final category involves the broader social and professional implications of AI in education. Concerns were raised about the impact of AI on the teaching profession, with four participants expressing fear that AI could diminish the teacher’s role, particularly in assessment, where overreliance on AI could reduce insight into students’ individual needs. One participant also mentioned the reduction of the role of the teacher due to overreliance on AI tools, and ultimately teachers’ fear of losing work. Two participants pointed out the potential social alienation that could result from increased automation, stressing that AI cannot replace the vital emotional and social support provided by teachers. Additionally, the digital divide was identified as an issue, with some schools lacking the technology necessary to integrate AI effectively, leading to unequal educational opportunities. Luckin (2017, 3) also notes that “[t] he less able and poorer students in society are generally least well served by education.” This disparity could further marginalise some students or even entire schools, as one participant in the present study mentioned. Finally, two participants made notable comments, with one stating, “I really don’t want to solve problems that could be prevented simply by not using AI,” and another remarking, “I think less damage would be done by getting rid of AI than working with it. But I do understand that what’s done is done, it’s here now, and the role of the teacher with time will be not to make humanity smarter through teaching, but to do their best to slow down the process of getting stupider.” The results of the qualitative analysis reveal that, in discussing the challenges presented by the integration of AI into their practice, most teachers in our study focused on ethical issues, particularly cheating. This is not surprising, as we can assume it is a concern they face daily. For example, secondary school teachers mentioned the issue of preparing students for the matura (school-leaving) exam, which includes an essay-writing task in its EFL component. 145ENGLISH LANGUAGE AND LITERATURE TEACHING Some teachers reported that students used generative AI to write their essays, leaving them unprepared for the exam. To address such issues, assessment practices will need to be ‘AI- proofed’ and better aligned with current trends and the needs of students in the 21st century. However, these changes need to start at the top. It is not enough for teachers to have the necessary skills; they also need support through curricula and more appropriate school- leaving exams. In terms of other challenges related to AI, which are frequently discussed in the literature, only a few teachers in our sample mentioned any. Examples include the widening digital divide (Luckin 2017; Chiu et al. 2023) and data privacy issues (Luckin 2017; Kohnke, Moorhouse, and Zou 2023b). This suggests that some teachers may lack in-depth knowledge of these critical issues, which should be addressed through appropriate professional development programs. Indeed, many teachers stressed the need for training, both for themselves and for their students, a concern that is frequently highlighted in other studies (Chiu et al. 2023). 4 Conclusion The results of our study reveal an overall positive outlook on the use of AI in EFL teaching and learning, with teachers expressing their readiness to integrate AI into their classes. However, their professional expectation and firsthand experiences with AI were less positive. This is not unexpected, as educational technology has often sparked controversy among educators (Wegerif and Major 2023), especially when seen as a threat to students’ learning agency (Han et al. 2024). These concerns underscore the importance of integrating AI thoughtfully, in ways that support, rather than diminish the vital role of teachers in promoting student autonomy and engagement. Further analysis revealed that these teachers lack experience with AI in education and are unable to share their skills with peers. This finding is in line with previous studies (e.g., Chounta et al. 2022; Pokrivcakova 2023; Polak, Schiavo, and Zancanaro 2022; Sütçü and Sütçü 2023), where teachers reported having limited knowledge of how to implement AI in their teaching. Given that AI-based tools have only recently become more widely recognised, this lack of familiarity is not surprising. While the teachers were generally positive about AI, they remained cautious about its potential to revolutionise education. This caution was echoed in the qualitative part of the study, where several teachers emphasised the importance of exercising restraint in integrating AI, ensuring it does not overwhelm the educational process. Additionally, no significant connection was found between teachers’ attitudes and the type of school, their age, and the length of experience in our sample, suggesting that AI training should be accessible to all teachers, regardless of their background. Both primary and secondary school teachers, as well as those with varying levels of experience, displayed similar attitudes and concerns. Our qualitative analysis also revealed that while some teachers demonstrated a nuanced awareness of the advantages and disadvantages presented by AI, the majority focused on its potential for lesson planning, material generation, and the risk of cheating and plagiarism. These results highlight the importance of developing training programs that encompass both 146 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ... the technical and ethical considerations related to AI, as pointed out by Kohnke, Moorhouse and Zou (2023a). Such training should not only empower teachers to use AI effectively but also equip them to instruct students about the responsible use of these technologies. These results are particularly relevant for university educators and pre-service teachers, since incorporating AI-related topics into teacher education programs is key to preparing future teachers to make effective use of AI in their practice and adapt to an evolving educational landscape. While the present study offers valuable insights, several limitations should be acknowledged. First, the limited sample size and its composition may affect the generalisability of the findings, especially as the volunteer participants may not accurately reflect the wider teacher population in Croatia. The sample included a disproportionate number of female participants, reflecting the general gender trend in the teaching profession, but this imbalance may still limit the generalisability of the findings. Furthermore, participants were partially recruited via Facebook and other social media platforms, which may have led to a bias towards individuals already inclined to use technology. Another limitation is that participants were primarily drawn from urban and semi-urban areas, and specific data on their regional background was not systematically collected. This prevented an in-depth exploration of potential regional differences (e.g., centre vs. periphery). These issues could be addressed in future research by including a larger and more diverse sample, with a focus on regional variation and exploring alternative recruitment methods to reach a broader range of participants. Additionally, since the data was based on self-reporting, there is further potential for bias. The study also focused on a limited set of variables, while other factors, such as prior exposure to AI or specific training, might have influenced attitudes. Finally, the study may not have fully accounted for variations in access to resources or institutional support, which could influence teachers’ willingness to adopt AI in their classrooms. Further studies could build on the findings by exploring additional variables, such as available training opportunities, and institutional support, which could also have a considerable impact on shaping attitudes. Examining how access to technological resources influences teachers’ adoption of AI could provide an important context for understanding the barriers to and opportunities for AI implementation in schools. In conclusion, our study underscores the need for practical, consistent teacher training to translate the positive attitudes of EFL teachers towards AI into effective classroom practices. By addressing the advantages and disadvantages of AI, we can better prepare students with the resources and knowledge needed to engage with AI in the 21st century. References Agency for Electronic Media and UNICEF. 2024. Umjetna inteligencija u obrazovanju. https://www.medijs kapismenost.hr/wp-content/uploads/2024/04/Umjetna-inteligencija-u-obrazovanju.pdf. Aghaziarati, Ali, Sara Nejatifar, and Ahmad Abedi. 2023. “Artificial intelligence in education: Investigating teacher attitudes.” AI and Tech in Behavioral and Social Sciences 1 (1): 35–42. https://doi.org/10.61838 /kman.aitech.1.1.6. Brinegar, Merrilee. 2023. “Chatbots as a supplementary language learning tool: Advantages, concerns, and implementation.” International Journal of Education and Social Science Research 6 (6): 223–30. https:// doi.org/10.37500/IJESSR.2023.6615. 147ENGLISH LANGUAGE AND LITERATURE TEACHING CARNET – Croatian Academic and Research Network. 2024a. Kurikulum fakultativnog predmeta za srednje škole Umjetna inteligencija: od koncepta do primjene. https://www.carnet.hr/pogledajte-kurikulu me-o-umjetnoj-inteligenciji-za-osnovne-i-srednje-skole/. —. 2024b. Kurikulum izvannastavne aktivnosti za osnovne škole Umjetna inteligencija: od koncepta do primjene. https://www.carnet.hr/pogledajte-kurikulume-o-umjetnoj-inteligenciji-za-osnovne-i-srednje -skole/. Chiu, Thomas KF, Qi Xia, Xinyan Zhou, Ching Sing Chai, and Miaoting Cheng. 2023. “Systematic literature review on opportunities, challenges, and future research recommendations of artificial intelligence in education.” Computers and Education: Artificial Intelligence 4: 100118. https://doi.org /10.1016/j.caeai.2022.100118. Chounta, Irene-Angelica, Emanuele Bardone, Aet Raudsep, and Margus Pedaste. 2022. “Exploring teachers’ perceptions of artificial intelligence as a tool to support their practice in Estonian K-12 education.” International Journal of Artificial Intelligence in Education 32 (3): 725–55. https://doi.org /10.1007/s40593-021-00243-5. Creely, Edwin. 2024. “Exploring the role of generative AI in enhancing language learning: Opportunities and challenges.” International Journal of Changes in Education 1 (3): 158–67. https://doi.org/10.478 52/bonviewIJCE42022495. Galindo-Domínguez, Héctor, Nahia Delgado, Lucía Campo, and Daniel Losada. 2024a. “Relationship between teachers’ digital competence and attitudes towards artificial intelligence in education.” International Journal of Educational Research 126: 102381. https://doi.org/10.1016/j.ijer.2024.10 2381. Galindo-Domínguez, Héctor, Martin Sainz de la Maza, Lucía Campo, and Daniel Losada. 2024b. “Design and validation of a multidimensional scale for assessing teachers’ perceptions towards artificial intelligence in education.” International Journal of Learning Technology (online first). https://doi.org/10 .1504/ijlt.2023.10062094. Gisbert Cervera, Mercè, and Francesca Caena. 2022. “Teachers’ digital competence for global teacher education.” European Journal of Teacher Education 45 (4): 451–55. https://doi.org/10.1080/02619768 .2022.2135855. Gu, Xiaoqing, Yuankun Zhu, and Xiaofeng Guo. 2013. “Meeting the ‘digital natives’: Understanding the acceptance of technology in classrooms.” Journal of Educational Technology & Society 16 (1): 392–402. Han, Ariel, Xiaofei Zhou, Zhenyao Cai, Shenshen Han, Richard Ko, Seth Corrigan, and Kylie A. Peppler. 2024. “Teachers, parents, and students’ perspectives on integrating generative AI into elementary literacy education.” In CHI ‘24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, edited by Florian Floyd Mueller, Penny Kyburz, Julie R. Williamson, Corina Sas, Max L. Wilson, Phoebe Toups Dugas, and Irina Shklovski, 1–17. Association for Computing Machinery. https://doi.org/10.1145/3613904.3642438. Javier, Darren Rey C., and Benjamin Luke Moorhouse. 2023. “Developing secondary school English language learners’ productive and critical use of ChatGPT.” TESOL Journal 15 (2): e755. https://doi.org /10.1002/tesj.755. Joseph, Genimon Vadakkemulanjanal, Kennedy Andrew Thomas, and Alex Nero. 2021. “Impact of technology readiness and techno stress on teacher engagement in higher secondary schools.” Digital Education Review 40: 51–65. https://doi.org/10.1344/der.2021.40.51-65. Kohnke, Lucas, Benjamin Luke Moorhouse, and Di Zou. 2023a. “ChatGPT for language teaching and learning.” RELC Journal 54 (2): 537–50. https://doi.org/10.1177/00336882231162868. —. 2023b. “Exploring generative artificial intelligence preparedness among university language instructors: A case study.” Computers and Education: Artificial Intelligence 5: 100156. https://doi.org/10.1016/j.ca eai.2023.100156. Liang, Jia-Cing, Gwo-Jen Hwang, Mei-Rong Alice Chen, and Darmawansah Darmawansah. 2023. “Roles and research foci of artificial intelligence in language education: An integrated bibliographic analysis and systematic review approach.” Interactive Learning Environments 31 (7): 4270–96. https://doi.org /10.1080/10494820.2021.1958348. Luckin, Rose. 2017. “Towards artificial intelligence-based assessment systems.” Nature Human Behaviour 1 (3): 0028. https://doi.org/10.1038/s41562-016-0028. 148 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ... Luckin, Rosemary, Mutlu Cukurova, Carmel Kent, and Benedict Du Boulay. 2022. “Empowering educators to be AI-ready.” Computers and Education: Artificial Intelligence 3: 100076. https://doi.org /10.1016/j.caeai.2022.10007. Molenaar, Inge. 2022. “The concept of hybrid human-AI regulation: Exemplifying how to support young learners’ self-regulated learning.” Computers and Education: Artificial Intelligence 3: 100070. https://doi .org/10.1016/j.caeai.2022.100070. Nascimbeni, Fabio, and Steven Vosloo. 2019. Digital Literacy for Children: Exploring Definitions and Frameworks. UNICEF Office of Global Insight and Policy. Nazaretsky, Tanya, Moriah Ariely, Mutlu Cukurova, and Giora Alexandron. 2022. “Teachers’ trust in AI‐ powered educational technology and a professional development program to improve it.” British Journal of Educational Technology 53 (4): 914–31. https://doi.org/10.1111/bjet.13232. O’Bannon, Blanche W., and Kevin Thomas. 2014. “Teacher perceptions of using mobile phones in the classroom: Age matters!” Computers & Education 74: 15–25. https://doi.org/10.1016/j.compedu.2014 .01.006. Pokrivcakova, Silvia. 2023. “Preparing teachers for the application of AI-powered technologies in foreign language education.” Journal of Language and Cultural Education 7 (3): 135–53. https://doi.org/10.24 78/jolace-2019-0025. Polak, Sara, Gianluca Schiavo, and Massimo Zancanaro. 2022. “Teachers’ perspective on artificial intelligence education: An initial investigation.” In CHI EA ‘22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems, edited by Simone Barbosa, Cliff Lampe, Caroline Appert, and David A. Shamma, 1–7. Association for Computing Machinery. https://doi.org/10.1145 /3491101.3519866. Rebolledo Font De la Vall, Roxana, and Fabián González Araya. 2023. “Exploring the benefits and challenges of AI-language learning tools.” International Journal of Social Sciences and Humanities Invention 10 (1): 7569–76. https://doi.org/10.18535/ijsshi/v10i01.02. Sütçü, Selim Soner, and Elif Sütçü. 2023. “English teachers’ attitudes and opinions towards artificial intelligence.” International Journal of Research in Teacher Education (IJRTE) 14 (3): 184–93. https:// doi.org/10.29329/ijrte.2023.598.12. Timpe-Laughlin, Veronika, Tetyana Sydorenko, and Phoebe Daurio. 2022. “Using spoken dialogue technology for L2 speaking practice: What do teachers think?” Computer Assisted Language Learning 35 (5–6): 1194–217. https://doi.org/10.1080/09588221.2020.1774904. Wegerif, Rupert, and Louis Major. 2023. The Theory of Educational Technology: Towards a Dialogic Foundation for Design. Taylor & Francis. Yesilyurt, Yusuf Emre. 2023. “AI-enabled assessment and feedback mechanisms for language learning: Transforming pedagogy and learner experience.” In Transforming the Language Teaching Experience in the Age of AI, edited by Galip Kartal, 25–43. https://doi.org/10.4018/978-1-6684-9893-4.ch002. Yuan, Yijia. 2023. “An empirical study of the efficacy of AI chatbots for English as a foreign language learning in primary education.” Interactive Learning Environments 32 (10): 6774–89. https://doi.org /10.1080/10494820.2023.2282112. Yue, Miao, Morris Siu-Yung Jong, and Davy Tsz Kit Ng. 2024. “Understanding K–12 teachers’ technological pedagogical content knowledge readiness and attitudes toward artificial intelligence education.” Education and Information Technologies 29: 19505–36. https://doi.org/10.1007/s10639 -024-12621-2. 149ENGLISH LANGUAGE AND LITERATURE TEACHING Appendix 1 Questionnaire 1 (adapted from Galindo-Domínguez et al. 2024a) 1 I am willing to use artificial intelligence in my teaching practice. 2 I am willing to explore new opportunities for integrating AI into teaching and learning processes. 3 I would love to be able to use artificial intelligence in my work as a teacher. 4 I am interested in learning about artificial intelligence in education. 5 I am interested in exploring the use of artificial intelligence as a complementary tool for my teaching practice. 6 The growing development of artificial intelligence in education is exciting to me. 7 Artificial intelligence should be introduced as part of teacher training. 8 There are many potential benefits to applying artificial intelligence in education. 9 I will stay up to date with the latest utilities and applications of artificial intelligence. 10 I will continue learning about artificial intelligence. 11 I don’t see how artificial intelligence could be relevant to my teaching practice. 12 I am convinced that artificial intelligence will have a positive impact on education. 13 Artificial intelligence will positively revolutionise education. 14 I hope that artificial intelligence can help me engage my students more. 15 Artificial intelligence can be used to assist students. 16 Artificial intelligence can be used to support students with specific educational needs (special educational needs students, gifted students, etc.). 17 Artificial intelligence can promote more personalised teaching. 18 Artificial intelligence can be used to create more personalised teaching materials. 19 Artificial intelligence can help my students perform better on school assignments. 20 Artificial intelligence can facilitate assessment and provide feedback to my students. 21 I have never interacted with artificial intelligence in an educational or general context. 22 I have had positive experiences with the use of artificial intelligence in education. 23 I have extensive experience with the use of artificial intelligence in education. 24 I can share my knowledge and skills about artificial intelligence with other teachers. 25 I have had some experiences with the use of artificial intelligence in education. 150 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ... Appendix 2 Questionnaire 2 1 What is your perspective on the potential of using AI in EFL lessons? 2 What problems do you see with using AI in EFL lessons? 3 What AI tools have you used or would like to use in class, if such tools exist? 4 Can you describe your experiences with using AI tools or materials in EFL lessons? Please describe the content and purpose of their use. 5 How does AI affect EFL curriculum/syllabus design and lesson planning? Can you name some examples? 6 How do you think AI influences EFL teaching strategies and student motivation? Can you describe some examples? 7 In your opinion, what are some ethical and social problems of using AI in (EFL) lessons? How should they be dealt with by the teachers and institutions? 8 How do you envisage the future of using AI in EFL lessons? How can teachers prepare themselves and their students? Translation Studies Part V 153TRANSLATION STUDIES 2025, Vol. 22 (1), 153-170(228) journals.uni-lj.si/elope https://doi.org/10.4312/elope.22.1.153-170 UDC: [811.111’25=163.6:33]:004.8 Nataša Gajšt University of Maribor, Slovenia Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence – A Slovenian-English Perspective ABSTRACT The recent emergence and the widespread use of AI-driven tools have significantly affected various aspects of human communication, including business-related professional communication. This pilot study explores how AI-driven tools can be used in drafting commercial correspondence by considering its genre conventions. To this end, we carried out a small-scale study to assess AI-driven tools for translating and drafting commercial correspondence. We used ChatGPT, Claude 3.5 Sonnet and Gemini 2.0 Flash to translate 15 letters from Slovenian into English and to draft 10 letters in English based on the prompts in Slovenian. Our key findings show that although the translations are similar, slight differences occur mainly at the level of formality and the scope of formulaic expressions. Concerning the drafts, the AI-driven tools produced adequate letters which might sometimes need light human editing. Keywords: Business English, commercial correspondence, translation, drafting, AI-driven tools, English, Slovenian Uporaba orodij umetne inteligence pri prevajanju in sestavljanju poslovnih dopisov – slovensko-angleški vidik IZVLEČEK Nedavni razmah in obsežna uporaba orodij, ki temeljijo na umetni inteligenci (UI), imata velik vpliv na različne vidike človeške komunikacije, vključno s strokovno komunikacijo v poslovnem okolju. Ta pilotna študija ugotavlja, kako lahko orodja, ki jih poganja UI, upo- rabimo pri pisanju poslovne korespondence z upoštevanjem njenih žanrskih značilnosti. V ta namen smo izvedli manjšo raziskavo, v kateri smo ocenjevali orodja, ki temeljijo na UI, za prevajanje in oblikovanje poslovne korespondence. Uporabili smo ChatGPT, Claude 3.5 Sonnet in Gemini 2.0 Flash za prevod 15 pisem iz slovenščine v angleščino ter za pripravo 10 pisem v angleščini na podlagi navodil v slovenščini. Naše ključne ugotovitve kažejo, da so prevodi med seboj razmeroma podobni, vendar se rahle razlike pojavljajo predvsem na ravni formalnosti in obsega rabe ustaljenih izrazov. Uporabljena orodja UI so pripravila ustrezna pisma, ki pa vendarle včasih potrebujejo manjše popravke. Ključne besede: poslovna angleščina, poslovna korespondenca, prevajanje, pisanje, orodja UI, angleščina, slovenščina 154 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ... 1 Introduction In the past couple of years, the surge in the usage of AI-driven tools (e.g., ChatGPT) has greatly impacted text production. Today, AI-driven tools can significantly facilitate text production and, consequently, impact written communication. They can be used to prepare written documents for both general and professional purposes, including various types of business-related documents. Among the latter, they can be used as an aid in translating and drafting commercial correspondence. According to a recent study by Cardon et al. (2023), AI-driven tools have transformed the way people communicate for business purposes. Since several AI-driven tools can translate texts from one language to another, their use can be especially beneficial to businesses, given current trends of increasing internationalization of business operations and consequent communication in, predominantly, English (Halimi and Shiyab 2015). Employees working in sales and purchasing may take advantage of these tools when communicating with business partners and customers in a language different from their first language. A substantial proportion of business communication is carried out in writing (Halimi and Shiyab 2015). Thus, it is crucial that any commerce-related letter be appropriately structured and written in a clear and professional manner. With a well-structured commercial letter or e-mail, the recipient can easily understand the message and the action they need to take based on its content. Also, a well-written message shows the sender’s professionalism and competence and their respect for the receiver. This professionalism adds to credibility and trust among business partners and customers and contributes to strong business relationships. On the other hand, a poorly written message can harm a company’s reputation. In other words, there is a correlation between well-written business-related communication and positive business results (Rogerson-Revell 2007, 1). The primary purpose of commercial correspondence is to address commerce-related matters (e.g., product enquiries, order confirmations, or complaints) (Ashley 2003). Therefore, it should be written clearly, concisely and without any ambiguities. Well-written commercial correspondence increases the chances of achieving the set goals: e.g., agreeing to proposed sales terms, finalizing or concluding the sale. In short, commercial correspondence should be written in the appropriate professional tone, observing genre conventions and including accurate specialised, sales-related terminology (Talbot 2009; Wilson and Wauson 2010; Sankrusme 2017). Following the above, the overall goal of this paper is to explore the ways in which AI- driven tools can be used to (1) translate commercial correspondence from Slovenian into English, and (2) draft commercial correspondence in English based on prompts in Slovenian by examining specific elements related to the genre conventions of commercial correspondence. First, we present the theoretical framework for our study. Next, we describe how we carried out our research. Third, we present and discuss the results of the study. In the final part, we summarize our findings and propose potential areas of research together with implications for practice. Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ... 155TRANSLATION STUDIES 2 Theoretical Framework This theoretical framework first gives an overview of commercial correspondence as a specific text type and text genre. Second, the application of AI-driven tools in the context of commercial correspondence translation and drafting in English is examined. 2.1 Commercial Correspondence as Text Type and Text Genre Commercial correspondence refers to professional written texts related to sales and purchase of goods and provision of services. At its core, it is a communication channel between buyers and sellers as the two key participants in commercial transactions. Its purpose is to address aspects of commercial transactions: enquiries and replies to enquiries regarding general terms of sale and terms of payment, quotations, placing of orders and replies to orders, complaints and replies to complaints, etc. (Abegg and Benford 1999b, 1999a; Armitage-Amato 2005; Ashley 2003; Sankrusme 2017; Bennie 2021). If viewed through the prism of systemic functional linguistics (Halliday and Matthiessen 2004, 61), commercial correspondence is created for specific communication purposes within the business context (the ideational level). Commercial correspondence also creates the relationship between the seller and the buyer by laying down their rights and obligations (the interpersonal level). The third level, the textual level, is the actual linguistic realisation of the purpose of the message and the interpersonal relationships between the two parties. This level is subject to lexical and grammatical characteristics and to the purpose of commercial correspondence and is realized through the typical structure of commercial letters. This view on commercial correspondence shows that it needs to be considered as a text type and as a text genre. As a text type (see Krajnc Ivič (2020) for a definition), commercial letters can be classified as a professional text type because they integrate the use of specialized sales- related terminology and form part of written business discourse that serves to fulfil specific tasks or functions. Via commercial correspondence, a company builds rapport with partners, suppliers, and customers, thus establishing and maintaining sales-related cooperation. More specifically, commercial correspondence is used to convey specific information (e.g., product or service details, prices, discounts, terms of delivery, or terms of payment) to negotiate and confirm sales-related agreements (e.g., stating and negotiating terms and conditions sale of a particular good or service), or to address any issues arising from non-performance of either the seller’s or customer’s obligations (e.g., dealing with customer complaints, delivery or payment delays, or faulty products) (Davis 2010). Like the interrelatedness of the ideational, interpersonal and textual levels of texts, the concepts of text type and text genre are also interrelated (Krajnc Ivič 2020). While text types are defined via functions of a specific group of texts (i.e., the ideational and interpersonal level), text genres are defined via the structure of this same group of texts (i.e., the textual level). As a specific text genre within the broader context of business-related communication, commercial correspondence should adhere to its established structural and linguistic conventions. 156 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ... Above all, commercial correspondence letters should follow a clear structure, which includes the salutation, the main body (the message of the text) and an appropriate closing (Ashley 2003; Lougheed 2003; Taylor 2012). Although the content of these letters varies, it is recommended that the information be presented in a clear and logically structured way. If the contents are complex, one is allowed to use bullet points to increase the readability of the text (Wilson and Wauson 2010). The structure of commercial correspondence letters is rather uniform and generally consists of four sections: the introduction, the core of the letter, the action required based on the letter, and a polite and positive ending. In the introduction, the sender frames the message into the context known to both the sender and the receiver (e.g., reference to an advertisement, or reference to previous contact or correspondence). The core of the letter addresses the reasons for writing (e.g., an enquiry about the product, or a reminder about the payment) and guides the reader to the next section, which provides information about the action that is expected from the receiver based on the previous section (e.g., sending a reply with the requested product information, or addressing concerns about late payment). The body of the letter ends with a polite and positive conclusion in which the sender expresses gratitude for the reader’s attention to the letter, a desire for the continuation of cooperation, and a clear indication of the next steps. As regards the language of the commercial correspondence, several key observations should be made. As professional written communication, commercial correspondence should primarily be written in a professional tone. That is, the language used should be professional and polite without colloquial expressions. However, employing an overly formal and somewhat outdated style of writing is also discouraged, particularly in the context of English in an international context as the lingua franca of the business world (Terk 2016; Terk and Chan 2014; Wallwork 2014; Gajšt 2014). The current trend in business writing leans towards a neutral, straightforward style of writing (Abegg and Benford 1999b; Taylor 2012), which adds to the clarity and conciseness of the message (Wilson and Wauson 2010, 454; Carey 2002). Finally, the language in commercial correspondence letters should be polite to reflect respect and professionalism on the part of the sender. Linked to genre conventions and the professional tone and style of writing in English, two characteristics should be pointed out: the use of passive voice and nominalization. In general, passive voice is used to place focus on the action rather than on the doer of the action (e.g., the doer of the action is unknown or irrelevant; highlighting the doer may be sensitive in nature, or avoidance of personal pronouns such as you or we) (Biber et al. 2021; Leech and Svartvik 1990; Quirk et al. 1985; Hribar 2021, 2018; Kalin Golob 2002). In commercial correspondence, the use of passive voice may be appropriate in complaints or refusals or other types of messages where direct reference to the doer of the action may not be appropriate from a politeness standpoint (e.g., ‘blame’). From the perspective of using plain English in the context of business-related writing in an international context, the use passive voice is used only when absolutely needed (Bailey 1996; Taylor 2012). The second characteristic is the use of nominal structures. Nominalization is common in professional texts since it adds to the formality and conciseness of the message. Like the passive voice, it also depersonalizes messages (‘They delayed the shipment.’ vs ‘There was a 157TRANSLATION STUDIES delay in shipment.’), compacts them and adds to the formality of the text (‘The shipment of the purchased goods will begin next month.’ vs ‘We will begin shipping the purchased goods next month.’). However, nominal structures may result in the text being more difficult to read; in that case, verbal structures are preferred. Summing up, good business writing in English in the international context should be polite, accurate, brief and clear (i.e., written in plain English and in an easy and natural style). 2.2 AI-Driven Tools for Text Drafting and Translation Today, AI-driven tools which can be used to either translate a text into English or write it in English based on a prompt in another language are widely available. They can perform a wide variety of tasks from grammar checks to creating written texts without much human intervention (Marzuki et al. 2023, 2). Several studies have been performed regarding the usefulness of these tools for text production and text translation. Most of these address such tools in a pedagogical context as an aid in writing or translation skills in a foreign language. Several studies have shown that students favour the use of AI as an aid in their learning, which was also supported by the results of writing tests and improved language proficiency (O’Neill 2016; Emara 2024; Kruk and Kałużna 2025). On the other hand, some studies have shown that the overuse of AI translation systems, despite saving time and increasing efficiency, can lead to the impairment of independent writing development and hinder critical thinking and deeper learning (Jaruwatsawat et al. 2024). That is, overreliance on AI-driven tools may lead users to become passive users of these tools. AI-driven tools for text production and text translation have both strengths and weaknesses. Regarding their strengths, they are fast, easily accessible and cost-effective (Saitkhanova 2024; Moneus and Sahari 2024). They are designed to continuously evolve and improve their output with every user interaction (e.g., linguistic patterns and idiomatic expressions). In addition, they can translate between multiple languages, which caters for diverse translation needs (Saitkhanova 2024; Suhardiman et al. 2024). In contrast, the main reported weaknesses or limitations of AI-driven tools for translation lie in contextual understanding, cultural sensitivity and capacity to deal with complex documents. They have limited ability to understand nuances in language or idiomatic expressions and metaphors and are not always able to fully comprehend cultural references, which may result in inappropriate translations. When it comes to complex documents or highly specialized texts (e.g., medical, technical, or legal), AI- driven tools do not show a high degree of accuracy, such as when translating highly specialized terminology (Moneus and Sahari 2024; Suhardiman et al. 2024; Alisherovich 2024). The challenges related to the uncritical use of AI-powered tools in translation thus demand a far more careful as well as critical evaluation of AI-translated texts in the post-translation stage to detect and eliminate inaccuracies in translation (Ning and Ban 2024). Orel Kos (2024) reports a similar finding in a study concerning translation for the screen, where the subtitles are often of lower quality if done with the help of machine translation than when done exclusively by humans. Although AI offers a quick, cost-effective translation, human post-editing of AI-translated texts has proven positive when specialized texts such as legal documents, marketing materials, technical manuals, and business reports are concerned, 158 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ... as it improved the linguistic accuracy, clarity and cultural appropriateness of the texts (Suhardiman et al. 2024). 2.2.1 AI-Driven Tools for Translation and Drafting of Commercial Correspondence – Case Selection For our study, we decided to use the following AI-driven tools: ChatGPT 4 (OpenAI n.d.), Claude 3.5 Sonnet (Claude.ai n.d.), and Gemini 2.0 Flash (Gemini 2024) since these offer mostly free access and can thus be used by anyone. Moreover, they can be used both for translating texts from Slovenian into English and for drafting letters in English based on prompts in Slovenian. To get a better understanding of these three AI-driven tools, we prompted them to give us their own description of their scope, their strengths and their weaknesses regarding their ability to translate and draft commercial correspondence letters. Our prompt was: “How would you describe yourself in terms of your scope, strengths and weaknesses when it comes to writing and translating commercial correspondence?” The responses are given in Tables 1 to 3 below. Table 1. Self-described scope of the selected AI-driven tools concerning commercial correspondence translation and production. ChatGPT • writes and translates commercial correspondence, including business inquiries, quotations, orders, contracts, invoices, and complaint letters, • ensures clarity, professionalism, and adherence to international business standards, • human editing advisable to refine business communication. Claude 3.5 Sonnet • can assist with virtually all forms of written business communication, including formal business letters, emails, memoranda, reports, proposals, meeting minutes, and basic contractual documents, • can work across multiple languages for translation purposes, • can adapt content for different regions and cultural contexts, • can handle both internal and external communications, from entry-level to executive correspondence. Gemini 2.0 Flash • well-trained on formal language and professional communication styles, • can adapt to different levels of formality as needed, • training data includes a substantial amount of business-related vocabulary – to generate text with appropriate terminology, • can produce various types of commercial correspondence, including letters, emails, memos, proposals, reports, and more, • can adhere to standard formatting conventions, • can translate commercial correspondence between multiple languages, facilitating international business communication. (OpenAI n.d..; Claude.ai n.d..; Gemini 2024) 159TRANSLATION STUDIES Table 1 shows that these AI-driven tools claim to be very good at creating formal and professional business communication. They can use extensive business vocabulary and are able to create different correspondence formats even in a multilingual setting, which makes Table 2. Self-described strengths of the selected AI-driven tools concerning commercial correspondence translation and production. ChatGPT • leverages AI tools to speed up translation while maintaining human oversight for final accuracy, • ensures that commercial correspondence is precise, avoiding ambiguity that could lead to misinterpretation or disputes, • follows standard structures for different types of business communication, e.g., formal salutations, closing remarks, and polite requests, • uses appropriate business and trade-related terminology, ensuring that terms related to payment, delivery, and warranties are correctly translated and applied, • can adjust tone and phrasing to match English and Slovenian business etiquette, ensuring politeness and professionalism. Claude 3.5 Sonnet • is effective at restructuring complex information into clear, readable formats, • can also provide detailed explanations for suggested improvements, helping users understand the reasoning behind specific writing choices • can write clear, professional business communications in multiple languages, • can maintain consistent tone and formality levels, • can follow standard business letter formats and conventions, • can adapt style for different business contexts and cultures, • can translate while preserving meaning and intent, • ensures proper terminology in specialized fields (e.g, financial or legal). Gemini 2.0 Flash • can quickly draft and translate correspondence, saving time and resources, • can generally produce grammatically correct and error-free text, essential for maintaining professionalism in business communication, • can maintain a consistent tone and style throughout a series of communications, which is important for branding and building relationships, • can generate neutral and unbiased language, avoiding emotional or subjective language that might be inappropriate in a business context, • can easily adapt to different formats and templates for various types of correspondence. (OpenAI n.d..; Claude.ai n.d..; Gemini 2024) 160 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ... them useful in a business context. Specifically, the responses provided by Gemini 2.0 Flash focused on formal and professional tone, business vocabulary, common correspondence formats and the ability to translate between several languages. ChatGPT’s focus was on specialized writing and translating various forms of commercial correspondence; additionally, it claimed a focus on clarity and professional tone in its output. However, it also highlighted the need to combine the AI-driven tool with human editing. Lastly, Claude 3.5 Sonnet claimed the ability to write and translate a wide range of commercial correspondence at different company hierarchy levels across various regional and cultural contexts. Table 3. Self-described weaknesses of AI-driven tools concerning commercial correspondence translation and production. ChatGPT • can handle standard business agreements, but highly complex legal clauses may require additional review by a legal expert • points out that AI-generated commercial correspondence often requires human refinement – to ensure appropriate tone and phrasing, particularly in sensitive negotiations • points out that some expressions do not translate directly between Slovenian and English –careful human adaptation is required to maintain professionalism Claude 3.5 Sonnet • cannot maintain ongoing correspondence beyond single sessions, • points out the need for human review of any legally binding documents, • cannot independently verify company-specific protocols or requirements, • points out that translations, while accurate for general business purposes, should be professionally reviewed for critical documents, • cannot access real-time business information or company-specific information, • cannot sign documents or provide legally binding translations. Gemini 2.0 Flash • can sometimes miss subtle cultural nuances in business communication, which can vary significantly across countries and industries, • might not be familiar with highly specialized jargon used in specific industries (this can affect both writing and translation accuracy), • does not have a deep understanding of complex business situations or strategies (limits its ability to craft truly effective and persuasive correspondence in nuanced situations), • cannot fully grasp the context of a long-term business relationship or the history of previous interactions, • industry-specific idioms or colloquialisms might not always be accurate or appropriate. (OpenAI n.d..; Claude.ai n.d..; Gemini 2024) 161TRANSLATION STUDIES According to the information provided by the three AI-driven tools, they are efficient, adaptable and fast in translating and drafting commercial correspondence. For example, both Gemini 2.0 Flash and ChatGPT highlighted their speed and efficiency as well as their overall accuracy in translating and drafting commercial correspondence. All three AI-driven tools claimed to be able to adhere to genre conventions (i.e., observing standard structures and formats of different types of texts), which includes the appropriate levels of formality and tone. Gemini 2.0 Flash specifically highlighted its grammatical accuracy. The outputs by the three AI-driven tools show similarities regarding their weaknesses in translating commercial correspondence, i.e. the inability to spot the nuances of business culture, or their lack of knowledge of highly specific jargon, business-related colloquialisms and idiomatic expressions. Also, they may struggle with maintaining contextual awareness over a long stretch of time. Some AI-driven tools also admitted their lack of actual experience with the business world and emotional intelligence. ChatGPT specifically pointed out its shortcomings and the need for human editing when it comes to the legal complexity of texts. Based on this framework, we formulated the following research questions: Research question 1: How effectively do selected AI-driven tools translate commercial correspondence from Slovenian into English in terms of commercial correspondence as a text genre? Research question 2: How effectively do selected AI-driven tools generate commercial correspondence in English based on prompts in Slovenian in terms of commercial correspondence as a text genre? 3 Method To answer our research questions, we designed a small-scale pilot study. We selected three freely available AI-driven tools: ChatGPT 4, Claude 3.5 Sonnet and Gemini 2.0 Flash. We performed our analysis for the two research questions separately. Being open-ended, our study approach enabled us to test the accuracy of adherence to the conventions of commercial correspondence as a text type and text genre. Concerning the first research question, we selected 15 commercial correspondence letters in Slovenian (enquires, replies to enquiries, offers, quotations, and complaints). These were model letters we currently use to teach commercial correspondence in our Business English classes and were based on typical letters found in English-language commercial correspondence textbooks or guidebooks. We entered these letters into the AI-driven tools ChatGPT 4, Claude 3.5 Sonnet and Gemini 2.0 Flash and prompted them to translate the texts. We used the same prompt with all three tools: “Translate the following letters into English.” We deliberately kept the prompt as simple as possible. After we obtained the outputs, we analysed them based on the predetermined criteria. Regarding commercial correspondence genre conventions, we limited our study to politeness, nominalization, the use of passive voice, and the ‘ease-of-read’ (as related to the use of English as a lingua franca in business) in line with the observations on commercial correspondence and the strengths and weaknesses of AI-driven tools. 162 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ... For the second research question, we selected 10 prompts (i.e., instructions) for drafting commercial correspondence in Slovenian (enquiries, replies to enquires, offers, and complaints). As in the case of the letters used for the first research question, these instructions are model samples we use in our classes to teach commercial correspondence writing in English. We entered them into ChatGPT 4, Claude 3.5 Sonnet and Gemini 2.0 Flash to get the letters in English. We used the same prompt with all three tools: “Draft the letter in English based on the prompt in Slovenian.” After we obtained the outputs (the drafted letters), we analysed them based on the guidelines for commercial correspondence in English commercial correspondence textbooks and handbooks and on the strengths and weaknesses of AI-driven tools to identify linguistic and contextual differences. As the final step in our analysis, we performed the Flesch-Kincaid and the Gunning Fog tests to see which of the AI-driven tools produced the texts (translations and drafted letters) that were the easiest to read and would be closest to the recommended clear, simple style for writing commercial correspondence (especially in the international context). The latter was designed to reduce unnecessary complexity in business writing (“Readability Checker - Reading Level Calculator” 2024; Miller 2024). The 0–100 scale for the Flesch-Kincaid test is as follows: 0–50 Very difficult (‘CEFRL C2 level’), 50–60 Fairly difficult (‘CEFRL C1 level’’), 60–70 Plain English (‘CEFRL B2 level’), 70–80 Fairly easy (‘CEFRL B1 level’), 80–90 (CEFRL A2 level’), 90–100 (‘CEFRL A1 level’) (Linguapress.com n.d.). The 0–20 scale for the Gunning Fog test is as follows: 1–5 (‘very easy to read’); 5–8 (‘a text considered ideal for average readers’), 8–11 (‘fairly difficult to read’), 11–20 (‘hard to read for most readers’). This scale was designed with the United States education system in mind and for its corresponding levels of education, i.e., primary school to graduate levels (Clickhelp.com n.d.). The average results of these tests are given separately for the translations and for the drafted letters. 4 Results and Discussion In this section of the paper, we present and discuss our findings. 4.1 Using AI-Driven Tools to Translate Commercial Correspondence from Slovenian into English The first research question addressed the ability of ChatGPT, Claude 3.5 Sonnet and Gemini 2.0 Flash to translate commercial correspondence into English. Since we provided the three tools with whole letters, we did not focus on the structure of the letters per se but on the differences and similarities in line with the selected criteria. At the outset, we should state that ChatGPT’s outputs included the advice that human editing might be relevant. Although we did not specifically state in our prompts that the sentences were part of commercial correspondence, Claude 3.5 Sonnet’s response to our prompt began with the information that it “will translate the letters while maintaining the formal business style and structure of the original” (Claude.ai n.d.) (N.B.: the tone for these translations was set to ‘formal’). This showed us that the tool recognised the letters as business-related. We present and discuss our findings in three areas: formality levels, including the notion of politeness, nominalisation, and the use of passive voice. The examples illustrating our 163TRANSLATION STUDIES findings are taken from different sections of the letters: the opening, the main body, the concluding part and the complimentary close. We give Slovenian sentences and their English equivalents as translated by the three selected AI-driven tools. Concerning the notion of politeness, we found that, overall, the translations were similar. However, they displayed slight differences in the adverbs used to soften or highlight the politeness of a statement or action. The following examples in Tables 4-7 taken from different types of letters illustrate this point. Table 4. Translating polite expressions – expressing interest. V najnovejši številki revije Sports&Health smo videli vaš oglas za športna oblačila in se zanimamo za vaše izdelke, še posebej za oblačila za zimske športe. ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash We saw your advertisement for sportswear in the latest issue of Sports & Health magazine and are very interested in your products, particularly clothing for winter sports. We saw your advertisement for sportswear in the latest issue of Sports&Health magazine and are interested in your products, particularly in clothing for winter sports. In the latest issue of Sports&Health magazine, we saw your advertisement for sportswear and are interested in your products, especially clothing for winter sports. Table 5. Translating polite expressions – expressing gratitude. Iskrena hvala za vaš dopis z dne 20. junija 20__, v katerem povprašujete po asortimanu naši izdelkov, ki smo jih predstavljali na sejmu Apimell v Italiji prejšnji teden. ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash Thank you very much for your letter dated June 20, 20__, in which you inquire about the range of our products that we showcased at the ‘Apimell’ fair in Italy last week. Thank you for your letter dated June 20th, 20__, inquiring about our product range presented at the ‘Apimell’ trade fair in Italy last week Thank you very much for your letter of June 20, 2024, in which you inquire about the range of our products that we presented at the ‘Apimell’ fair in Italy last week Table 6. Translating polite expressions – request for action. Ker bi želeli naročene izdelke ponuditi našim kupcem že v začetku septembra, vas prosimo, da potrdite naročilo v teku 8 delovnih dni. ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash As we would like to offer these products to our customers at the beginning of September, we kindly ask you to confirm the order within 8 working days. As we would like to offer these products to our customers at the beginning of September, please confirm the order within 8 working days. As we would like to offer the ordered products to our customers at the start of September, please confirm the order within 8 working days. 164 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ... Table 7. Translating polite expressions – request for action. Prav tako vas prosimo, da pri izstavitvi fakture upoštevate 3% popust, ki ga zagotavljate za naročila, dana do 15. junija 20__. ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash We also kindly ask that you apply the 3% discount you offer for orders placed before June 15, 20__, when issuing the invoice. We also request that you apply the 3% discount guaranteed for orders placed before June 15th, 20__, when issuing the invoice. We also kindly ask you to take into account the 3% discount you provide for orders placed by June 15th, 20__, when issuing the invoice. The example in Table 8 illustrates how the selected AI-driven tools translated two consecutive sentences containing a request. A variety of formulaic expressions were used by the three tools, and they did not repeat the same expression in the second sentence. Table 8. Translating polite expressions – request for action. Prosimo za vaš prodajni katalog ter cenik z navedenimi cenami CIP Maribor. Prav tako prosimo, da nam posredujete vaše prodajne pogoje, vključujoč količinske popuste ter dobavne roke.  ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash Please provide us with your sales catalog and price list, including CIP Maribor prices. We would also appreciate information on your sales conditions, including bulk discounts and delivery times.  Please send us your sales catalog and price list with prices quoted CIP Maribor. Additionally, please provide your sales terms, including volume discounts and delivery times.  Please send us your sales catalog and price list with CIP Maribor prices listed. We also ask that you provide us with your sales terms, including volume discounts and delivery times.  Apart from the politeness, we can also observe differences in the levels of formality, ranging from more formal, “apply” vs “take into account” (Table 7) and “provide us with” to “send us” (Table 8). Focusing on instances of nominalisation, we admit that the small sample in our study did not include many such cases. The first example in Table 9 shows that two AI-driven tools translated Slovenian verbal structure as a nominal structure, and only one tool kept the verbal structure (i.e. Gemini 2.0 Flash). The second example, on the other hand, shows only one case of nominalisation among the translated sentences. Table 9. Examples of nominalisation in translations. Veseli nas, da ste se tako hitro odzvali na naše naročilo. ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash We appreciate your prompt response to our order. We appreciate your prompt response to our order. We are pleased that you re- sponded so quickly to our order. Upam, da bomo tudi v prihodnje lahko sodelovali enako dobro kot doslej. ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash We hope that our future collaboration will be just as successful as in the past. I hope we can continue to collaborate as successfully as we have done so far. I hope that we will be able to cooperate as well in the future as we have done so far. 165TRANSLATION STUDIES Another instance of nominalisation can be observed in the next example. This time, nominalisation was present in the Slovenian sentence, and it was retained in the translations of two AI-driven tools. Table 10. Example of nominalisation in English translations from Slovenian nominal structure (buyer’s complaint – body of the letter). Po natančnem pregledu prispelega blaga smo ugotovili, da ste poslali drugačno število izdelkov, kot je bilo dogovorjeno.  ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash However, after carefully inspecting the delivered goods, we have noticed discrepancies in the quantity of items sent compared to what was agreed upon. After a careful inspection of the received goods, we have discovered that you sent different quantities than agreed upon.  After a careful inspection of the received goods, we found that you sent a different number of products than agreed. Although English sentences containing verbal structures and not nominalisations may be easier to read than those with nominalisation, they read as less formal: “prompt response” vs “you responded so quickly”; “our future collaboration” vs “we can continue to collaborate” and “we will be able to cooperate”. Concerning the example in Table 10, we argue that both translations of Slovenian nominal structure “po natančnem pregledu” are written in a formal tone, i.e. “after carefully inspecting” and “after a careful inspection.” (However, it needs to be pointed out that ChatGPT’s output should be in a different tense to be grammatically correct, i.e., “…after carefully inspecting the delivered goods, we noticed discrepancies…”). Passive voice is the third typical feature of professional texts. First, our pilot study showed that the passive voice constructions in Slovenian were, as a rule, translated into English as passive voice. On the other hand, we found instances of translation from active voice in Slovenian to passive voice in English, as illustrated with the examples in Table 11. Table 11. Examples of active and passive voice in Slovenian-to-English translations. Naročeno blago lahko dobavimo najkasneje v 30 dneh od prejema vašega naročila. ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash Ordered goods can be delivered no later than 30 days from receipt of your order. We can deliver ordered goods within 30 days of receiving your order. We can deliver the ordered goods no later than 30 days from receiving your order.  Naše izdelke lahko pošljemo v lični darilni embalaži (cena posameznega pakiranja je dodatnih EUR 3,50 za posamezni izdelek). ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash Our products can be packaged in elegant gift wrapping (at an additional cost of EUR 3.50 per item). We can ship our products in elegant gift packaging (additional cost of EUR 3.50 per item for individual packaging). Our products can be sent in attractive gift packaging (the price of each individual packaging is an additional EUR 3.50 per item). 166 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ... The final step in our analysis of AI-generated translations of commercial correspondence was to perform the ease-of-read tests: the Flesch-Kincaid test and the Gunning Fog Index. The results are given in Table 12 per each AI-powered tool. Table 12. Ease-of-read results per AI-driven tool (translated letters). Ease-of-read test AI-driven tool ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash Flesch-Kincaid 49.51 52.42 56.80 Gunning Fog Index 15.65 14.83 14.44 The data in Table 12 show that ChatGPT-produced texts are overall the most difficult to read among the three translations, and Gemini 2.0 Flash-generated ones are the easiest to read. This was also evident from the translated sentences, as shown by the following examples (Table 13). Table 13. Examples of sentences – ease-of-read. Ker bi radi ohranili dobro sodelovanje z vašim podjetjem, vas prosimo, da sprejmete naše iskreno opravičilo. ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash As we value our cooperation with your company, we sincerely hope you will accept our apology. As we wish to maintain good business relations with your company, we ask that you accept our sincere apology. As we would like to maintain good cooperation with your company, please accept our sincere apology. Prosimo, da nam posredujete vaš aktualni izvozni cenik in pogoje dobave ter plačilne pogoje. ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash We would appreciate it if you could provide us with your latest export price list, along with your delivery and payment terms. Please provide us with your current export price list, delivery terms, and payment conditions. Please send us your current export price list and delivery terms, as well as payment terms. These examples show differences in the levels of formality, with ChatGPT-created translations being the most formal, i.e. “we sincerely hope you will accept our apology” and “we would appreciate it if you could provide us with” and the Gemini 2.0 Flash-created translations being the most colloquial, i.e. “please accept our sincere apology”, and “please send us”, although they still exhibit politeness and some level of formality. 4.2 Using AI-Driven Tools to Draft Commercial Correspondence in English Based on Prompts in Slovenian The second research question addressed the ability of ChatGPT, Claude 3.5 Sonnet and Gemini 2.0 Flash to draft commercial correspondence in English based on instructions in Slovenian. The tools were not given highly structured instructions as with the letters for research question 1. Instead, they were given comprehensive guidelines including the main pieces of information to be included in the letters (this information did not precisely follow 167TRANSLATION STUDIES the standard steps as prescribed by the advice on constructing commercial correspondence letters). Structure-wise, we found no major differences between the outputs by the three AI tools. As a rule, they all followed the typical ‘opening – body – conclusion’ format. Also, all three AI tools put the content of the letters in separate paragraphs, which further contributed to the overall visual presentation. The only major difference regarding structure was the use of bullet points to make the letters easier to read. Regarding the levels of formality, the use of passive voice and nominalisation, we concluded that the letters drafted by the AI-driven tools practically did not differ from those translated by the same tools. That is, the levels of formality that were evident in translation per each AI- driven tool were also reflected in the drafted letters. This leads us to conclude that, within the scope of this study, these three AI-driven tools are very consistent in their output. Given the limitations of this paper, we do not include specific translations in this section. As with research question one, we also performed the ease-of-read tests on the AI-generated letters, the Flesch-Kincaid test and the Gunning Fog Index (see Table 14 for the results). Table 14. Ease-of-read results per AI-driven tool (drafted letters). Ease-of-read test AI-driven tool ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash Flesch-Kincaid 38.33 30.10 45.59 Gunning Fog Index 16.62 19.39 14.63 The data in Table 14 above show that all three tools produced texts that are difficult to read based on the two ease-of-read tests, the most difficult texts being produced by Claude 3.5 Sonnet, followed by ChatGPT and Gemini 2.0 Flash. Compared to the results in Table 12, where the texts were translations, it shows that Claude 3.5 Sonnet produced the most complex text. Based on these scores, it might be assumed, within the scope of this pilot study, that Gemini 2.0 Flash and ChatGPT are more suitable for drafting commercial correspondence in line with the plain English guidelines and the trends regarding Business English as a lingua franca. This may also lead to the assumption that Gemini 2.0 Flash is the most suitable for the translation of commercial correspondence because it generates clear and easy-to-read texts in a rather neutral professional tone, avoiding excessive formality. That is, it seems to produce texts that prioritize readability, without compromising on accuracy or professionalism. All this, however, cannot be generalized beyond the scope of our pilot study. Based on the ease-of-read scores for translations and drafted letters alike, we conclude that ChatGPT’s outputs are the most formal and may be more suited to some legal contexts. But for everyday commercial correspondence between buyers and sellers, especially since in the international business context most are not native speakers of English, the less formal outputs given by Gemini 2.0 Flash in particular would be the right balance between the formality of commercial correspondence and the need for clear and easily readable commercial correspondence letters in English. As for Claude 3.5 Sonnet, its main strength lies in the fact that it offers the option of selecting the style of its outputs, i.e., normal, concise, explanatory 168 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ... and formal, thus enabling the user to adapt the message’s level of formality depending on its receiver and its purpose. This, of course, can also be achieved with the other two AI-driven tools provided that the prompts include instructions on the level of formality. Linking our findings with the self-description by these AI-driven tools regarding their capabilities, our pilot study indicated that all three can translate and draft various forms of sentences and commercial correspondence letters from Slovenian into English, while maintaining a clear and professional tone and following standard formatting conventions. 5 Conclusion This aim of our small-scale pilot study was to test how selected AI-powered tools can be used for translating and drafting commercial correspondence letters. To this end, we chose three freely available tools, ChatGPT, Claude 3.5 Sonnet and Gemini 2.0 Flash, and analysed the similarities and differences in their outputs. Our findings have shown that all three AI tools performed their tasks in accordance with general guidelines and principles of writing commercial correspondence in English in international business contexts. They accurately translated or drafted the messages in the given letters or instructions in Slovenian since the tone in the outputs was largely appropriate and ranged from a more formal to a more neutral level of formality. As these tools are based on LLMs (large language models), their outputs are also grammatically accurate. In short, they are consistent in tone and style, and they follow the overall norms of commercial correspondence as a specific text genre. These AI-driven tools essentially have similar core capabilities when it comes to commercial correspondence in terms of professional communication styles in line with genre conventions as presented in English commercial correspondence textbooks and guidelines. Among the limitations of our pilot study is its scope, since it was based on a limited number of texts. Furthermore, we focused on a few selected elements for analysis, we chose not to analyse the terminological accuracy of translated specialized terms, and the tools we used may not consider the reader’s professional knowledge and background or familiarity with the topic of the message (readability test issue). Regarding the linguistic capabilities of AI-driven tools for commercial correspondence translation and drafting, this pilot study did not test them from the perspective of other cultural contexts. Also, we did not focus on specialized terminology, as this would require a different study design and focus. In addition, we included only basic prompts, which might need to be upgraded. Despite these limitations, our qualitative pilot study brings valuable insight into the potential of AI application in professional written communication. Our findings will be of interest to both linguists and professional users alike, as they provide a glimpse into the capabilities of AI-driven tools for translating or drafting professional texts. The findings could also have implications for teaching language and language for specific purposes to translation trainees (cf. Koletnik, Kirbiš, and Zupan 2023) and English language students (cf. Tica and Krsmanović 2024). Although a small- scale study, it adds to the knowledge of how AI, as a fast-evolving phenomenon, can facilitate written business communication, yet we need to bear in mind that despite its benefits, the outputs still need human oversight and potential revision – as was stated by the AI tools themselves when prompted to describe their abilities. 169TRANSLATION STUDIES A natural progression beyond this study could stem from its very limitations. Since AI-driven tools are evolving rapidly, new and more extensive studies are encouraged and should be performed by including a larger body of texts in the analysis, comparing the outputs after refining the prompts (e.g., by using a more neutral tone, or adapting the output to British English or American English standards), testing the AI-driven tools’ translation capabilities regarding other professional text types and text genres, or focusing on the accuracy of terminology translation, the correct use of modal verbs, or even the grammatical accuracy of AI-driven tools’ outputs. Also, any specific aspect of genre conventions (the use of passive voice or other structures) could be analysed in greater detail. References Abegg, Birgit, and Michael Benford. 1999a. Communication for Business, Satzbausteine. Hueber Verlag. —. 1999b. Communication for Business: Zeitgemäße englische Handelskorrespondenz und Bürokommunikation. Lehrbuch. Hueber Verlag. Alisherovich, Raimov Lazizjon. 2024. “The peculiarities of artificial intelligence and human translation.” Multidisciplinary Journal of Science and Technology 4 (6): 692–96. Armitage-Amato, Rachel. 2005. Poslovni stiki, Angleščina: [dokumenti, pisma, e-sporočila, pogovori ...: jezikovni priročnik]. 1. izd. ed. PONS. Rokus. Ashley, A. 2003. Oxford Handbook of Commercial Correspondence. Oxford University Press. Bailey, Edward P. 1996. Plain English at Work: A Guide to Writing and Speaking. Oxford University Press. Bennie, Michael. 2021. Guide to Good Business Communications: How to Write and Speak English Well in Every Business Situation. How To Books. Biber, Douglas, Stig Johansson, Geoffrey N. Leech, Susan Conrad, and Edward Finegan. 2021. Grammar of Spoken and Written English. John Benjamins. Cardon, Peter, Carolin Fleischmann, Jolanta Aritz, Minna Logemann, and Jeanette Heidewald. 2023. “The challenges and opportunities of AI-assisted writing: Developing AI literacy for the AI age.” Business and Professional Communication Quarterly 86 (3): 257–95. https://doi.org/10.1177/2329490623117 6517. Carey, John A., ed. 2002. Business Letters for Busy People: Time Saving, Ready-to-Use Letters for Any Occasion. Career Press. Claude.ai. n.d. “Claude 3.5 Sonnet.” https://claude.ai. Clickhelp.com. n.d. “Gunning Fog Index.” https://clickhelp.com/software-documentation-tool/user-manu al/gunning-fog-index.html. Davis, Kenneth W. 2010. Business Writing and Communication: The McGraw-Hill 36-hour course. 2nd ed. McGraw-Hill Professional. Emara, Eman Abd El-Hafeaz Mohamad. 2024. “Using AI tools to enhance translation skills among basic education English major students.” CDELT Occasional Papers in the Development of English Education 86 (1): 339–80. Gajšt, Nataša. 2014. “Business English as a lingua franca – A cross-cultural perspective of teaching English for business purposes.” ELOPE: English Language Overseas Perspectives and Enquiries 11 (2): 77–87. https://doi.org/10.4312/elope.11.2.77-87. Gemini. 2024. “Gemini 2.0 Flash.” https://gemini.google.com/app?hl=en-GB. Halimi, Sonia, and Said M. Shiyab. 2015. Writing Business Letters Across Languages: A Guide to Writing Clear and Concise Business Letters for Translation Purposes. Cambridge Scholars Publishing. Halliday, M.A.K., and Christian M.I.M. Matthiessen. 2004. An Introduction to Functional Grammar. 3rd ed. Arnold. Hribar, Nataša. 2018. “Tvornik in trpnik.” Pravna praksa 38 (7–8): 46. —. 2021. “Trpnik.” Pravna praksa 40 (33): 34. Jaruwatsawat, Manassa, Chutimon Khiaosen, Waraporn Sriram, and Suphakit Phoowong. 2024. “EFL learners’ perspectives on using AI translation applications.” BRU ELT JOURNAL 2 (3): 252–67. https://doi.org/10.14456/bej.2024.17. 170 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ... Kalin Golob, Monika. 2002. “Slovenščina v pravni praksi (73. del): ‘Kdo se boji trpnika?’.” Pravna praksa 21 (4): 35. Koletnik, Melita, Andrej Kirbiš, and Simon Zupan. 2023. “Prevajalce poučujemo jezik drugače, mar ne?” Ars & Humanitas 17 (1): 109–23. https://doi.org/10.4312/ars.17.1.109-123. Krajnc Ivič, Mira. 2020. “Obravnava besedil: Merila za razlikovanje med besedilno vrsto in besedilnim tipom.” Slavistična revija 68 (1): 55–71. https://srl.si/ojs/srl/article/view/2020-1-1-4. Kruk, Mariusz, and Agnieszka Kałużna. 2025. “Investigating the role of AI tools in enhancing translation skills, emotional experiences, and motivation in L2 learning.” European Journal of Education 60 (1): 1–12. https://doi.org/https://doi.org/10.1111/ejed.12859. Leech, Geoffrey, and Jan Svartvik. 1990. A Communicative Grammar of English. Longman. Linguapress.com. n.d. “Flesch-Kincaid readability and EFL.” https://linguapress.com/teachers/flesch-kinca id.htm. Lougheed, Lin. 2003. Business Correspondence: A Guide to Everyday Writing: Intermediate. 2nd ed. Longman, Pearson Education. Marzuki, Utami Widiati, Diyenti Rusdin, Darwin, and Inda Indrawati. 2023. “The impact of AI writing tools on the content and organization of students’ writing: EFL teachers’ perspective.” Cogent Education 10 (2): 2236469. https://doi.org/10.1080/2331186X.2023.2236469. Miller, Nic. 2024. “The Flesch–Kincaid readability test.” Flowpoint.ai. Flowpoint. https://flowpoint.ai/blog /flesch-kincaid. Moneus, Ahmed Mohammed, and Yousef Sahari. 2024. “Artificial intelligence and human translation: A contrastive study based on legal texts.” Heliyon 10 (6): e28106. https://doi.org/10.1016/j.heliyon.20 24.e28106. Ning, Jing, and Haidong Ban. 2024. “Application of translation technology in AI-powered translation workshop.” The Educational Review, USA 8 (10): 1242–49. https://doi.org/10.26855/er.2024.10.008. O’Neill, Errol M. 2016. “Measuring the impact of online translation on FL writing scores.” IALLT Journal of Language Learning Technologies 46 (2): 1–39. OpenAI. n.d. “ChatGPT.” ChatGPT. OpenAI. https://chatgpt.com/. Orel Kos, Silvana. 2024. “Introduction of machine translation into audiovisual translation teaching.” ELOPE: English Language Overseas Perspectives and Enquiries 21 (1): 185–208.  https://doi .org/10.4312/elope.21.1.185-208. Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik. 1985. A Comprehensive Grammar of the English Language. Longman. “Readability Checker – Reading Level Calculator.” 2024. Charactercalculator.com. https://charactercalcula tor.com/readability-checker/. Rogerson-Revell, Pamela. 2007. “Using English for international business: A European case study.” English for Specific Purposes 26 (1): 103–20. https://doi.org/10.1016/j.esp.2005.12.004. Saitkhanova, Aziza. 2024. “Artificial intelligence in translation: Benefits and drawbacks.” International Journal of Scientific Trends 3 (11): 70–76. Sankrusme, Sinee. 2017. International Business Correspondence. Anchor Academic Publishing. Suhardiman, Sani, Anggy Giri Prawiyogi, Dedy Frianto, Bunga Putri Maulia, and Zhuldiz Anay. 2024. “Need a translation? AI or human.” The Conference of EFL Studies 1 (1): 12–23. Talbot, Fiona. 2009. How to Write Effective Business English: The Essential Toolkit for Composing Powerful Letters, Emails and More, for Today’s Business Needs. Kogan Page Publishers. Taylor, Shirley. 2012. Model Business Letters, Emails & Other Business Documents. Pearson Education. Terk, Natasha. 2016. Writing at Work. The Write It Well Series on Business Communication. Write It Well. Terk, Natasha, and Janis Fisher Chan. 2014. Effective Email: Concise, Clear Writing to Advance Your Business Needs. Write It Well. Tica, Lena, and Ivana Krsmanović. 2024. “Overcoming the writer’s block? Exploring students’ motivation and perspectives on using ChatGPT as a writing assistance tool in ESP.” ELOPE: English Language Overseas Perspectives and Enquiries 21 (1): 129–49. https://doi.org/10.4312/elope.21.1.129-149. Wallwork, Adrian. 2014. Email and Commercial Correspondence: A Guide to Professional English. Springer. Wilson, Kevin, and Jennifer Wauson. 2010. The AMA Handbook of Business Writing: The Ultimate Guide to Style, Grammar, Usage, Punctuation, Construction, and Formatting. AMACOM/American Management Association. 171TRANSLATION STUDIES Simon Zupan, Zmago Pavličič, Melanija Larisa Fabčič University of Maribor, Slovenia 2025, Vol. 22 (1), 171-184(228) journals.uni-lj.si/elope https://doi.org/10.4312/elope.22.1.171-184 UDC: [8111.111’367.622=163.6:62]:004.89 Machine Translation of Independent Nominal Phrases in Technical Texts ABSTRACT This paper deals with machine translations of independent noun phrases in technical texts, which are not part of any sentence structure but function on their own, typically in tables and illustrations. Such nominal structures are common in technical texts because they allow technical writers to increase lexical density and precision in expression. On the other hand, these phrases pose a challenge for machine translation engines, as their meaning depends on the context. Independent noun phrases from a service manual, which were translated from English into Slovene by two different machine translators (DeepL and Google Translate), are considered in this paper. Their comparison with the original showed some limitations of machine translation engines in translating noun phrases, since approximately half of them showed a noticeable change in meaning. Keywords: technical texts, machine translation, nominal phrases, translation shifts, technical translation Strojno prevajanje samostojnih samostalniških besednih zvez v tehničnih besedilih IZVLEČEK Prispevek obravnava strojne prevode samostojnih samostalniških besednih zvez v tehničnih besedilih, ki niso del stavčnih struktur, temveč se pojavljajo zunaj konteksta, najpogosteje v preglednicah in grafičnih prikazih. Tovrstne besedne zveze se pogosto pojavljajo v tehničnih besedilih, saj piscem omogočajo večjo leksikalno gostoto in konciznost pri izražanju. Po drugi strani predstavljajo izziv za strojne prevajalnike, saj je njihov pomen odvisen od sobesedila. V prispevku so obravnavane samostoječe samostalniške besedne zveze iz servisnega priročnika, ki so bile iz angleščine v slovenščino prevedene z dvema različnima strojnima prevajalnikoma (DeepL in Google Translate). Njihova primerjava z izvirnikom je pokazala nekatere omejitve strojnih prevajalnikov pri prevajanju samostalniških besednih zvez, saj se je pri približno polovici besednih zvez opazno spremenil njihov pomen. Ključne besede: tehnična besedila, strojno prevajanje, samostalniške besedne zveze, prevodni premiki, prevajanje tehničnih besedil 172 S. Zupan, Z. Pavličič, M.L. Fabčič Machine Translation of Independent Nominal Phrases in Technical Texts 1 Introduction Technical translation is a specialized branch of translation studies that focuses on conveying technical content across languages. Despite its critical role in global communication, it has historically received less academic attention than other translation domains, even though it accounts for a sizeable portion of worldwide translation output (Kingscott 2002). As is the case with other types of text, many technical texts today are machine-translated. One of the questions this raises is how translation engines deal with the specific characteristics of technical texts such as the use of specialized terminology, lexical density, conciseness, or frequent use of passive voice. The purpose of the present study is to examine how machine translation engines deal with independent nominal phrases, which are common in technical texts, where data is presented in tables or images. The article has two parts: in the first, theoretical part, the major features of technical translation and machine translation are presented. In the second, empirical part, independent nominal phrases from a service manual in English are compared to their translations generated by two machine translation engines and analysed. The article ends by drawing conclusions from the analysis. 2 Technical Translation Technical translation is a field of translation studies that focuses on texts with technical content. Although it is often referred to together with scientific translation (e.g., Olohan 2016), significant differences do exist between the two areas. The main characteristic of scientific texts is that they “discuss, analyze and synthesize information with a view to explaining ideas, proposing new theories or evaluating methods,” while technical texts are “designed to convey information as clearly as possible” (Byrne 2014, 2). Technical texts thus represent an applicative extension of scientific texts. From a research standpoint, it is notable that in comparison with some other fields of translation studies, this field has received little scholarly attention, given that technical translation is estimated to represent as much as 90% of global translation output (Kingscott 2002, 247). On the other hand, according to the BITRA bibliography of translation research, only 9.3% of publications address technical translation (Aixelá 2004). In practical terms, technical texts refer to a variety of documents with technical content. These range from user manuals and expert technical reports written in narrative linear prose, on the one hand, to data sheets with tables, lists of nominal phrases and little context, on the other (cf. Byrne 2014, 58-73). In turn, technical writing features different textual and linguistic characteristics, depending on its purpose and target readers. One common observation is that the language of technical writing is expected to be clear, simple, and concise (Herman 1993, 11; Byrne 2014, 48). In contrast to literary texts, for example, technical texts thus typically do not abound in elements such as figures of speech, rhyme, or convoluted sentences; instead, technical writing is expected to be clear, objective, and unambiguous. Another characteristic that is directly or indirectly discussed in every treatise on technical translation (e.g., Galinski and Budin 1993; Byrne 2006; 2014; Olohan 2022) is terminology, which refers to a specialized subset of concepts and vocabulary that typify a particular subject area. Indeed, Pinchuck (1977, 19) claims that vocabulary is the most significant linguistic feature S. Zupan, Z. Pavličič, M.L. Fabčič 173TRANSLATION STUDIES of technical writing. Although Newmark (2008, 151) refuted that, claiming that terminology usually constitutes only 5-10% of the total content of technical texts, terminology remains an essential element of technical writing1. In contrast to other types of texts, technical texts often also visually distinguish themselves through multimodality, given that they include diagrams, graphs or photographs to complement the verbal text (Byrne 2014, 54). In addition to the use of passive voice or the prevalence of the present tense, one prominent linguistic feature of technical texts is nominalization (Newmark 2008, 151; Olohan 2022, 329). The frequency of nominal structures in technical texts is not surprising, given that technical writing strives for conciseness, and nominal structures deliver precisely that: lexical density. Regarding the discourse on science and technology, the phenomenon was analysed in detail by functional linguist Michael Halliday (2004). According to him, one result of the evolution of technical writing was that it helped organize grammar as a resource for generating meaning in metaphorical ways. This meant that items such as adjectives and verbs referring to “qualities” and “processes,” were first decoupled from their original lexical realizations, and then both meanings were recoupled through the new grammatical category of noun. One such example is the word length, which carries the quality of the adjective “long” (“quality”), but also belongs to the grammatical category of noun, i.e., the nominal meaning of “entity” or “thing.” Given that such words carry two category meanings, Halliday calls the phenomenon “grammatical metaphor.” The advantage of such structures is that it is possible to compress and combine multiple meanings into nominal phrases. On the other hand, this becomes a problem because overly concise and compressed structures sometimes become ambiguous or even incomprehensible (Byrne 2006, 83). One famous example is the phrase lung cancer death rates, which can mean anything from the number of deaths from lung cancer, on the one hand, to the amount of time in which patients with lung cancer die, on the other (Halliday 2004, 170). As Halliday’s example also shows, the problem of ambiguity is compounded when such phrases appear with little or no context, which is often the case with technical writing, which abounds in tables and illustrations. In Slovenia, the field of technical translation in conjunction with machine translation remains under-researched, with most scholars focusing on other types of translation (e.g., Mezeg 2023; Orel Kos 2024). 3 Machine Translation Machine translation (MT) automates the production of a target-language text from a source- language text. Over the decades, scientists have worked on various approaches to MT (for an overview, see Naveen and Trojovský 2024; Araghi and Palangkaraya 2024). Previously, the two most recognizable ones were Rule-Based MT and Statistical MT. In recent years, however, neural machine translation (NMT) has become the most promising new venue, utilizing models loosely inspired by the human brain, which employ artificial neural networks (see, for example, Zhang and Zong 2020). NMT translation involves two phases: encoding and decoding. During the encoding phase, each word in the source text is given a distinct neural 1 In Olohan’s (2022) monograph Scientific and Technical Translation, for example, the terms terminology and terminological appear over one hundred times on 250 pages. 174 S. Zupan, Z. Pavličič, M.L. Fabčič Machine Translation of Independent Nominal Phrases in Technical Texts representation or embedding. The word embeddings are subsequently combined to form a sentence-level representation. This process modifies the individual representations based on context, resulting in a contextualized interpretation. During the decoding phase, the sentence- level representation is systematically broken down to produce the target sentence one word at a time. These two phases are carried out by interconnected artificial neural networks – the encoder and the decoder – together forming a unified network (Pérez-Ortiz et al. 2022). NMT can generate non-existent target language words and fluent but inaccurate translations. Fluent output might mask these inaccuracies. Like other significant text dataset technologies, NMT can amplify training data biases. NMT systems require significant training time, computing power, energy, and specialized hardware (GPUs). They also demand massive training datasets unavailable for all language pairs (Kenny 2022). This reliance on automated solutions, however, may divert attention from the critical, in‐depth analysis of the source text – a drawback also observed by Koletnik Korošec (2011), who noted that unstructured use of machine translation can undermine thorough source text evaluation. The present research used two publicly available, multilingual neural machine translation services, DeepL Translator and Google Translate. DeepL Translator, like most translation systems, employs artificial neural networks for text translation. These networks undergo training on extensive datasets comprising millions of translated texts. DeepL’s website (How does DeepL Work 2021) reports numerous enhancements to the underlying neural network methodology. While most publicly available translation systems are direct modifications of the transformer architecture, and DeepL’s networks incorporate elements of this architecture, such as attention mechanisms, significant topological differences contribute to a reported substantial improvement in translation quality compared to the public research state of the art. A strong focus is placed on the targeted acquisition of specialized training data to enhance translation quality. This involves the development of specialized crawlers designed to locate and automatically assess the quality of translations available online. While public research typically employs supervised learning for network training, where the network iteratively compares generated translations with training data translations and adjusts weights based on discrepancies, DeepL reportedly utilizes additional techniques from other machine learning domains to achieve notable improvements. Training is conducted on networks with many billions of parameters. Emphasis is placed on efficient parameter utilization, enabling comparable translation quality to be achieved with smaller, faster networks. Two distinct language models are currently offered by DeepL for the translation of specific language pairs: a classic model and a next-generation model. The classic language model uses DeepL’s established AI neural network architecture for translation and is available for all supported languages. Over 800 language combinations are currently possible, including Slovene. DeepL Translator also supports translations into British English and American English. The next-generation language model is powered by a large language model (LLM) infrastructure. This LLM leverages extensive multilingual text corpora to address complex problems and is specifically trained for translation. Using proprietary LLMs within the next-generation model improves translation quality, particularly for longer texts. Specialized 175TRANSLATION STUDIES LLM infrastructure, uniquely tuned for language processing, facilitates more human-like translations, and reduces the risk of hallucinations and misinformation. Furthermore, unlike general-purpose models trained on publicly sourced internet data, DeepL’s next-generation model benefits from over seven years of proprietary data curated for translation and content creation. Currently, however, the next-generation language model does not support Slovene (“About the Next,” n. d.). Google Translate is the second publicly available multilingual neural machine translation service used in the present research. Like DeepL, it offers a website interface, mobile applications for Android and iOS, and an application programming interface (API). As of February 2025, it supports 249 languages and language varieties at various levels. Launched in April 2006 as a statistical machine translation service, it gathered initial linguistic data from United Nations and European Parliament documents and transcripts. Rather than direct translation, texts were initially translated to English and pivoted to the target language for most supported language combinations. In September 2016, Google’s research team announced the development of the Google Neural Machine Translation system (GNMT) to enhance fluency and accuracy. In November of the same year, Google Translate transitioned to GNMT. This system employed an extensive end-to-end artificial neural network utilizing deep learning. GNMT improved translation quality compared to statistical machine translation by employing an example-based machine translation (EBMT) method, learning from millions of examples. Whole sentences were translated at once rather than piecemeal. This broader context facilitated the identification of more relevant translations, subsequently rearranged and adjusted for improved grammatical accuracy and human-like fluency. Since 2020, GNMT has been phased out, and deep learning networks based on transformers have been implemented. Despite advancements in automated translation, Google’s engineers acknowledge that its quality remains imperfect, especially for low-resource languages. Even the latest models are susceptible to common machine translation errors, such as “poor performance on particular genres of subject matter (domains), conflating different dialects of a language, producing overly literal translations, and poor performance on informal and spoken language” (Caswell and Liang 2020). 4 Empirical Study To evaluate machine translation, independent nominal phrases were compared with their respective machine translations. “Independent nominal phrases” in this paper refers to phrases that meet the following two criteria: 1) they have nouns as their heads; and 2) they are not an integral part of any sentence but instead appear on their own, outside any (explicit) syntactic structure, in technical texts typically in tables and illustrations. The text used in the analysis was a service repair manual for the diesel and gasoline Caterpillar forklifts of the GP and DP 15K, 18K, 20K, 25K, 30K, 35K series (Pub. No. 99719-60120), which were produced between the mid-1990s and 2007 (Caterpillar LPG n. d.)2. The original text was in English and was available in electronic form as a readable pdf document. The source 2 The authors want to thank Darko Rihard and Marko Fajfar from Vilfis d.o.o. for their help with forklift truck-related terminology. 176 S. Zupan, Z. Pavličič, M.L. Fabčič Machine Translation of Independent Nominal Phrases in Technical Texts texts were not additionally pre-formatted before translation. The complete manual comprised 384 pages. For the study, the first thirty pages of the manual were machine-translated into Slovene using Google Translate (GT) and the professional (subscription-based) version of DeepL (DL). Next, the first one hundred subsequent independent units comprising nominal phrases from an illustration and a table on pages 1-2 to 1-5 were extracted and aligned with their two machine translations. Excluded were repetitions of identical phrases with identical translations. Most translation units were simple nominal phrases with single noun heads (e.g., front axle), including single-word phrases (e.g. mast), while a small number of other units comprised sets of (appositional) nominal phrases separated by parentheses, slash or colon (e.g., Kg/mm (lb/in.); tread (front/double tires); applicable truck model designation 35: 3.3 ton class). These examples were treated as single translation units because they functioned as one unit of meaning. The original text included a few typos and grammatical errors, which remained uncorrected because the idea was to see how translation engines would deal with these. In total, the corpus of one hundred source units in English comprised a total of 305 words and 139 lemmas (e.g., truck, trucks are two words, the base form of which (truck) corresponds to one lemma)3. Some phrases reappeared in identical form in the corpus several times: the most frequent, for example, was serial number, which recurred six times; four phrases (e.g., simplex mast; duplex mast) appeared four times; the rest had fewer recurrences. All one hundred source units were compared to their corresponding translations generated by the two translation engines and evaluated qualitatively and quantitatively. In the absence of a specific model for describing translation shifts in acontextual nominal phrases, descriptors were adapted from other translatological models and theories, such as those by Leuven-Zwart (1989; 1990); Klaudy and Károly (2005); Toury (2012); and Krüger (2015). Following a preliminary comparison and analysis of various types of nominal phrases, the following descriptors were used to describe the relationship between the source and target translation units generated by the same translation engine: No shift. Source and target phrases have equal or near-equal semantic, formal, and functional properties. Examples include phrases such as general information, whose Slovene translation splošne informacije is considered both a formal and functional equivalent of the English phrase. Other examples include target phrases that have several possible lexical varieties (e.g., vrsta motorja or tip motorja for engine type), all of which are considered adequate. 1. Semantic shift. The semantic gap is too large to infer the meaning of the source phrase based on the translation. A typical example is the phrase duplex mast, which was translated as dvostranski jambor in Slovene. Although the head noun jambor corresponds to the English noun mast, it does so only in the context of sailboats; in the context of heavy machinery, however, the correct technical term in Slovene is jarem or teleskop. In addition, the adjective duplex, referring to the two stacks or sections of the mast that can be extended vertically, is translated as dvostranski. i.e., as two-sided, which likewise is a mismatch with the original meaning. The category also includes examples of made- up translations, which in the context of artificial intelligence are popularly referred to 3 The corpus was analysed with Sketch Engine (http://www.sketchengine.eu). 177TRANSLATION STUDIES as hallucinations. One such interesting example is the word underclearnace (sic). As can be seen, the original technical term is misspelled and should have been spelled as underclearance, referring to the physical distance between the frame of the forklift and the ground below it. The translation engine, however, “translated” the original phrase as podnaprava in Slovene, which practically is an nonexistent noun in the Slovene lexicon, given that only one or two references could be found in relation to the sub-installation, but even those were from an unrelated domain of emission allowances. 2. Terminological shift. Source and target phrases overlap semantically to the extent that the meaning of the original phrase can be inferred; however, the term used is a general or non-standard expression and not an established or standard technical term. An example of this type of shift is the phrase single wheels, referring to number of parallel wheels at the same end of the forklift truck axis. While GT used the correct technical equivalent enojna kolesa in Slovene, DL translated the phrase as posamezna kolesa, whose back- translation is individual wheels. Although posamezna kolesa could apply in other contexts and makes it possible to decipher its meaning, the correct technical term to be used in this context is enojna kolesa. Another case in point involves examples of poor style, although the adverb nazaj is used. The category also includes examples of poor style such as poimenovanje, which appeared as the DL equivalent of the English phrase designation, referring to the type of forklift truck; a stylistically and terminologically better translation in Slovene would have been oznaka. 3. Grammatical shift. Source and target phrases have different grammatical features. Given that English is an analytic language and Slovene as the target language a synthetic one with several inflectional morphemes, target phrases are expected to deviate from the source ones grammatically; also possible are grammatical disagreements (e.g., number, gender) within target phrases. One such example is the English phrase minimum intersecting isle, which was translated as najmanjša otok, where the feminine suffix -a in the attributive adjective disagrees with the masculine head noun otok – the morphologically correct version of the phrase would be najmanjši otok. 4. Orthographic shift. Target text features orthographic shifts such as incorrect hyphenation, capitalization etc. One example is the abbreviation Ref. No., where both abbreviated words are capitalized in English. The first letters in the corresponding Slovene translation Ref. Št. likewise are capitalized; however, this conflicts with the rules of Standard Slovene, according to which lower case should have been used in the second abbreviation. 5. Terminological inconsistency. The same source phrase is translated in various ways in the target texts. Included is every first iteration of a different translation. One such example in English is the phrase simplex mast, which in Slovene appears in three different translations: enostavni jambor; dvostranski (sic) drog; and as simplex jambor. 6. No translation. In a small number of examples, no translation was provided, and the original source text phrase was reproduced in the target text, e.g.: the phrase [Mast] (square brackets used in the original text) appears as [Mast] also in Slovene (where the corresponding technical term is teleskop). 178 S. Zupan, Z. Pavličič, M.L. Fabčič Machine Translation of Independent Nominal Phrases in Technical Texts The descriptors were not discrete categories excluding one another. In a small number of units, two descriptors were used for the same translation. The target phrase najmanjša otok, for example, included a semantic shift as well as a grammatical error because of a gender disagreement between the adjective and the noun in Slovene translation. Similarly, the phrase [podvozje] was an example of an adequate translation and thus marked as “no shift;” however, it was marked for terminological inconsistency, because the previous iteration of the same original phrase (chassis) was machine-translated by the same engine as šasija. 5 Results Overall, the results showed that in most categories, both engines produced translations of comparable quality. No translation shifts were observed in 46% of translations generated by DL, while GT performed slightly better with 49% of units with no translation shifts. On the other hand, both engines generated a similar proportion of translations with semantic shifts: 41% in DL vs. 42% in GT. GT performed slightly better in terms of terminological shifts, which were observed in 8% of all translations, while in DL translations, that proportion was 22%. In contrast, no grammatical shifts were observed in DL generated phrases, while 4% of GT contained grammatical errors. Orthographic shifts were observed in 4% of both DL and GT translations. The only category with more noticeable discrepancies was terminological inconsistency, where no inconsistencies were observed in DL, whereas in GT, in 3% of the units the same technical term was translated in two or more ways. Only one percent of units remained untranslated by both translation engines. The distribution is presented in Table 1. Table 1. Distribution of translation shifts. N o sh ift (% ) Se m an tic sh ift (% ) Te rm in ol og ic al sh ift (% ) G ra m m at ic al sh ift (% ) O rt ho gr ap hi c sh ift (% ) Te rm in ol og ic al in co ns ist en cy (% ) N o tr an sla tio n (% ) Deepl 46 41 12 0 4 0 1 Google Translate 49 42 8 4 4 3 1 As is the case with all quantitative data, the numbers show only part of the picture. The following is a detailed qualitative analysis of examples included in each of the seven categories. 5.1 No Shift As indicated by the relative values, just under half of all translation units in the corpus showed no shifts, meaning that the phrases were considered functional equivalents of their source phrases. The analysis showed that this group of translation units could be divided into two subgroups. The first featured phrases which are considered common and are widely used in other more general contexts. A typical example is the phrase General information, which appeared four times in the source English text. In both translations, it was consistently 179TRANSLATION STUDIES translated as Splošne informacije. The absence of shifts comes as no surprise, given that this is the standard version of the phrase in Slovene, which appears in a variety of contexts even beyond technical writing, as indicated by over 600 instances of the phrase in the largest corpus of Written Standard Slovene, Gigafida 2.0. The same goes for other common phrases that also appear in non-technical writing such as serial number or dimensions, whose respective Slovene equivalents serijska številka and dimenzije also relatively frequently appear in non- technical texts (cf. Gigafida 2.0). The engines also successfully dealt with a few terms that were considered more technical, such as output shaft, a mechanical part that connects the drive wheels and the gearbox, which was translated as izhodna gred by both Deepl and GT. However, it should be noted that neither of the translation engines translated the phrase as odgonska gred, which is another technical equivalent for the same mechanical part in Slovene. 5.2 Semantic Shifts As evident from Table 1, over 40% of units in both translations included target text phrases whose meaning deviated from the source phrase to the extent that it made their understanding practically impossible. The analysis showed that most of these radical shifts were the result of two factors: 1) the ambiguity of phrases whose meaning is context-dependent; 2) the properties of technical language and terminology that typically are not part of the general vocabulary. In most cases, the translation engines struggled with the same units--however, not always. A typical example of a unit whose meaning is context-dependent is the source noun truck. In the source manual, the noun is consistently used in the shorter, elliptical form of the longer phrase forklift truck, a standard expression for this type of industrial vehicle. However, in Slovene, the elliptical form poses a challenge for translation engines, given that the equivalent of truck in Slovene is tovornjak, which in turn is unrelated to forklift trucks; instead, tovornjak is a standard general Slovene term for a specialized vehicle for transporting freight. In translation, the meaning of the source unit thus changed radically. It is also interesting that in those sections of the English manual, where the complete phrase forklift truck was used, neither of the two translation engines had difficulties and consistently translated it as viličar, which is the Slovene equivalent of the phrase forklift truck; however, once the elliptical form appears, both translation engines struggled with its interpretation, regardless of the fact that the phrase appeared in longer full form elsewhere in the text. Lack of context also posed a problem in the one-word phrase reverse, which in this case referred to the travel speed for driving backwards. Although obratno, used in GT is a lexical meaning of the adverb, it does not fit the context; instead, the adverb used should have been vzvratno, as correctly identified by DL. A similar problem appears with the source phrase free lift, referring to “the distance a forklift operator can raise the forks without extending the mast” (“What is a free lift,” n. d.). In both machine translations, however, the phrase turned into brezplačno dvigalo, which could be backtranslated as a free elevator and obviously bears no relation to the source phrase. The problem arose because the engines seemingly built the translation based on the headword lift in the source text. One of its lexical meanings is elevator; in turn, this most likely led to the use of an incorrect premodifier brezplačno, the Slovene lexical equivalent of the adjective 180 S. Zupan, Z. Pavličič, M.L. Fabčič Machine Translation of Independent Nominal Phrases in Technical Texts free, i.e. something requiring no monetary compensation. In the given context, of course, the phrase is out of place. Similarly, work performance, describing the properties of the truck, was translated incorrectly by both engines as delovna uspešnost; the latter is an established phrase in Slovene; however, not in the context of machinery but rather in labour relations. A similar problem appeared in relation to the English phrase transmission serial number, which seems straightforward as a designation of the serial number of the assembly connecting the engine and wheels. Neither of the translation engines struggled with the common phrase serial number and adequately translated it as serijska številka. However, both misinterpreted the premodifying noun transmission. It is true that one of its lexical meanings is that of prenos (in the sense of a transfer), which is used by both translation engines, but in this case, the meaning becomes misplaced, given that the phrase relates to the engine part. 5.3 Terminological Shifts Terminological shifts were the third most common category of translation shifts. In contrast to semantic shifts, this category included translations that can be understood by readers but are terminologically inadequate because of failure to comply with standard technical terminology. Both translation engines had comparable results in this category, with DL outperforming GT by a small margin. As was the case with semantic shifts, terminological shifts in both translation engines commonly appeared in one-word phrases whose meaning was highly context dependent. An example of this the word items, which appeared on top of a column referring to the technical specifications of the forklift truck that are presented in the table. In DL, the noun was translated as elementi and in GT as predmeti. Although both are lexical equivalents of items, neither of the two translations fits the context; postavke would have been better. Similarly, poimenovanje (DL) and imenovanje (GT) are both close to one of the source phrase lexical meanings of designation; however, oznaka is considered a more adequate technical translation. But multi-word units also posed a challenge. With some, the discrepancy was less noticeable than with others. A case in point is the phrase disassembly diagram, referring to the diagrams in the manual that show the order or relationship in which parts are disassembled. Both engines translated the phrase as diagram razstavljanja, whose meaning is likely to be clear to most speakers of Slovene, although the established technical term in Slovene is shema razstavljanja. Another example of a phrase that the engines struggled with was travel speed. As the translation hitrost potovanja shows, the confusion likely arose from the noun travel, whose basic lexical meaning in Slovene is that of potovanje; however, in this context, the resulting phrase, in conjunction with the head noun hitrost, refers more to the pace at which tourists enjoy their travels. 5.4 Grammatical Shifts Unsurprisingly, there were few grammatical shifts. None were observed in the DL translations, and only four in the GT translations. One of those was an example of gender disagreement between the headword and its premodifier (najmanjša otok), while two displayed a grammatical case mismatch (obremenitev porazdelitev instead of porazdelitev obremenitve and Powershift menjalnik modeli instead of modeli z menjalnikom Powershift). The last shift featured 181TRANSLATION STUDIES the longer phrase overall height (to top of mast lowered), where GT failed to incorporate the participle lowered into the translation; in turn, the resulting phrase featured an incorrect use of the participle in the postmodifying position: skupna višina (do vrha jambora spuščen). 5.5 Terminological Inconsistencies Terminological inconsistencies were also infrequent. In DL, none were observed, while GT featured three units where the same term was rendered in various ways in the translation. The first involved the noun disassembly, which appeared in three different phrases. In the first two, disassembly diagram and disassembly sequence, the noun was translated as razstavljanje; however, in the third iteration, suggestions for disassembly, the same noun appeared as demontaža, which typically is the standard technical term for the procedure described. The second term was chassis. When this noun appeared as part of a phrase, it was translated as šasija, which is an established technical term for chassis in Slovene. It is notable that in the third iteration, the noun appeared on its own as a single-word phrase and was translated as podvozje, which is a synonym for šasija in Slovene. The third and most notable example was the noun mast, referring to one of the main forklift parts, the mechanical implement for lifting or lowering the load at the front of the vehicle. In the original, it appeared as part of fifteen different phrases. The first of those iterations was the mast serial number, in which mast is translated as jambor by both engines. Although jambor is a Slovene lexical equivalent of mast, it only applies in the context of sailboats; in forklifts, the corresponding technical term is jarm or teleskop. On the same page in the manual, mast also appears in the phrase chassis and mast model identification. In this instance, GT translated the noun as drog. In the remaining iterations in the GT translation, the noun varied again between drog and jambor; in DL, it was consistently translated as jambor. 5.6 No Translation Both translations included only one unit that remained untranslated by both translation engines: the word [mast]. One plausible reason for this was the square brackets, which may have confused the engines and had them mistake the brackets for part of machine language. Another group of items that remained untranslated were imperial units of measurement, which accompanied metric units of measurement in brackets, e.g., mm (in.). Given that metric units are standard in Slovene, the use of imperial alongside them was acceptable. 5.7 Miscellaneous An interesting example includes translation of words with typos, which is not an uncommon phenomenon in texts. The case in point is the phrase underclearnace (at frame), referring to the distance between the chassis and the ground. DL translated it as podnaprava (v okvirju). Although the word sounds feasible in Slovene in terms of its form and morphological characteristics, it is not a common word and is hence an example of hallucination. Misinterpretation by DL manifests itself in the prepositional phrase at frame, which indicates the point at which the distance from the ground is measured, whereas Slovene translation 182 S. Zupan, Z. Pavličič, M.L. Fabčič Machine Translation of Independent Nominal Phrases in Technical Texts places that same point inside the frame, suggesting that the engine misinterpreted it. What is also notable that the same word remained untranslated by GT. 6 Conclusions This study highlights the challenges of machine translation in handling independent nominal phrases in technical text. The comparison of Google Translate and DeepL into Slovene revealed both their strengths and their limitations in dealing with specialized terminology. Nearly half the translated units showed no shift, indicating that common phrases were adequately rendered. However, semantic shifts were prevalent (over 40%), often due to ambiguity and lack of contextual information. Key issues included mistranslation of elliptical forms (e.g., truck instead of forklift truck) and misinterpretation of industry-specific terms like free lift and transmission serial number. Terminological shifts affected precision, with general expressions replacing technical terms. While these translations were understandable, they lacked standard industry accuracy. Grammatical and orthographic shifts were minimal, with DeepL producing no grammatical errors and Google Translate showing minor inconsistencies. However, terminological inconsistencies in Google Translate indicated weaker consistency mechanisms compared to DeepL. A small number of untranslated units, such as mast, suggest formatting-related processing issues in machine translation engines. The study underscores the importance of integrating domain-specific resources and human post-editing, as well as pre-editing and pre-formatting of texts to enhance translation reliability. This observation aligns with the findings of Hazemali et al. (2024), whose evaluation of chatbot performance in reading digitized texts showed that while the chatbot used in the study exhibited some success in handling typos and minor language errors, it produced only a 20% success rate in tasks demanding deeper language comprehension and struggled with complex sentence structure and domain- specific terminology in Slovene. Experienced human translators typically do not miss phenomena such as repetition in the immediate textual vicinity; in addition, humans can also process graphic representations of information. While both translation engines performed comparably, future improvements should focus on context recognition and the handling of specialized terminology. These findings, alongside evidence from studies in other domains – such as Mohar, Orthaber and Onič (2020), who demonstrated that machine translation quality deteriorates with increasing sentence complexity in literary texts – underscore the need for ongoing refinement of MT systems to better handle both technical and stylistically rich content. It should be noted that technical translators, in contrast, for example, with literary translators, strive above all towards precision and comprehensibility “since the consequences of lexical error, however slight, are more serious: a poor literary translation leads to a dissatisfied reader, whereas a misleading technical translation could result in a hazard to human life” (Hann 1992, 7). Further research should explore the impact of context on machine translation accuracy and investigate AI-driven enhancements for better translation consistency and precision. 183TRANSLATION STUDIES References “About the next-generation language model.” n. d. DeepL Help Center. https://support.deepl.com/hc/en -us/articles/14241705319580-About-the-next-generation-language-model. Aixelá, Franco, J. 2004. “The study of technical and scientific translation: An examination of its historical development.” Journal of Specialised Translation 1. https://jostrans.soap2.ch/issue01/art_aixela.php. Araghi, Sahar, and Alfons Palangkaraya. 2024. “The link between translation difficulty and the quality of machine translation: A literature review and empirical investigation.” Language Resources & Evaluation 58: 1093–1114. https://doi.org/10.1007/s10579-024-09735-x. Byrne, Jody. 2006. Technical Translation: Usability Strategies for Translating Technical Documentation. Springer. —. 2014. Scientific and Technical Translation Explained. Routledge. Caswell, Isaac, and Bowen Liang. 2020. “Recent advances in Google Translate.” Google Research Blog, June 8. https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html. “Caterpillar LPG forklifts specifications.” n. d. Lectura specs. https://www.lectura-specs.com/en/specs/forkli fts/lpg-forklifts-caterpillar. Galinski, Christian, and Gerhard Budin. 1993. “New trends in translation-oriented terminology management.” In Scientific and Technical Translation, edited by Sue Ellen Wright and Leland D. Wright, Jr. 209–16. John Benjamins. Gigafida 2.0: Corpus of Written Standard Slovene. https://viri.cjvt.si/gigafida. Halliday, M.A.K. 2004. The Language of Science, edited by J. J. Webster. Continuum. Hann, Michael. 1992. The Key to Technical Translation. Volume 2: Terminology/Lexicography. John Benjamins. Hazemali, David, Janez Osojnik, Tomaž Onič, Tadej Todorović, and Mladen Borovič. 2024. “Evaluating chatbot assistance in historical document analysis.” Moderna arhivistika 7 (2): 53–83. https://doi.org /10.54356/ma/2024/biub3010. Herman, Mark. 1993. “Technical translation style: Clarity, concision, correctness.” In Scientific and Technical Translation, edited by Sue Ellen Wright and Leland D. Wright, Jr., 11–20. John Benjamins. “How Does DeepL Work? #Network Architecture.” 2021. DeepL Blog, November 1. https://www.deepl .com/en/blog/how-does-deepl-work#network_architecture. Kenny, Dorothy. 2022. “Human and machine translation.” In Machine Translation for Everyone: Empowering Users in the Age of Artificial Intelligence, edited by Dorothy Kenny, 24–46. Language Science Press. Kingscott, Geoffrey. 2002. “Technical translation and related disciplines.” Perspectives 10 (4): 247–55. https://doi.org/10.1080/0907676X.2002.9961449. Klaudy, Kinga, and Krisztina Károly. 2005. “Implicitation in translation: Empirical evidence for operational asymmetry in translation.” Across Languages and Cultures 6 (1): 13–28. https://doi.org/10 .1556/Acr.6.2005.1.2. Koletnik Korošec, Melita. 2011. “Applicability and challenges of using machine translation in translator training.” ELOPE: English Language Overseas Perspectives and Enquiries 8 (2): 7–18. https://doi.org/10 .4312/elope.8.2.7-18. Krüger, Ralph. 2015. The Interface between Scientific and Technical Translation Studies and Cognitive Linguistics with Particular Emphasis on Explicitation and Implicitation as Indicators of Translational Text- Context Interaction. Frank & Timme. Leuven-Zwart, K. M. v. 1989. “Translation and original: Similarities and dissimilarities I.” Target 1 (2): 151–81. —. 1990. “Translation and original: Similarities and dissimilarities II.” Target 2 (1): 69–95. Mezeg, Adriana. 2023. “Ali sploh še potrebujemo prevajalce? Strojno prevajanje iz francoščine v slovenščino.” Ars & Humanitas 17 (1): 139–54. https://doi.org/10.4312/ars.17.1.139-154. Mohar, Tjaša, Sara Orthaber, and Tomaž Onič. 2020. “Machine translated Atwood: Utopia or dystopia?” ELOPE: English Language Overseas Perspectives and Enquiries 17 (1): 125–41. https://doi.org/10.4312 /elope.17.1.125-141. 184 S. Zupan, Z. Pavličič, M.L. Fabčič Machine Translation of Independent Nominal Phrases in Technical Texts Naveen, Palanichamy, and Pavel Trojovský. 2024. “Overview and challenges of machine translation for contextually appropriate translations.” iScience 27 (10): 110878. https://doi.org/10.1016/j.isci.2024 .110878. Newmark, Peter. 2008. A Textbook of Translation. Twelfth impression. Longman. Olohan, Maeve. 2016. Scientific and Technical Translation. Routledge. —. 2022. “Translating technical texts.” In The Cambridge Handbook of Translation, edited by Kirsten Malmkjær, 321–39. Cambridge University Press. Orel Kos, Silvana. 2024. “Introduction of machine translation into audiovisual translation teaching.” ELOPE: English Language Overseas Perspectives and Enquiries 21 (1): 185–208. https://doi .org/10.4312/elope.21.1.185-208. Pérez-Ortiz, Juan Antonio, Mikel L. Forcada, and Felipe Sánchez-Martínez. 2022. “How neural machine translation works.” In Machine Translation for Everyone: Empowering Users in the Age of Artificial Intelligence, edited by Dorothy Kenny, 141–64. Language Science Press. Pinchuck, Isadore. 1977. Scientific and Technical Translation. André Deutsch. Toury, Gideon. 2012. Descriptive Translation Studies and Beyond. Benjamins. “What is a free lift on a forklift?” n. d. American Forklifts. https://americanforklifts.org/what-is-a-free-lift -on-a-forklift/. Zhang, JiaJun, and Chengqing Zong. 2020. “Neural machine translation: Challenges, progress and future.” Science China Technological Sciences 63: 2028–50. https://doi.org/10.1007/s11431-020-1632 -x. 185TRANSLATION STUDIES 2025, Vol. 22 (1), 185-201(228) journals.uni-lj.si/elope https://doi.org/10.4312/elope.22.1.185-201 UDC: [811.111’373.612.2:81’25]:[004.89:378] Marija Brala Vukanović University of Rijeka, Croatia Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for the EFL Classroom ABSTRACT The paper explores the use of AI translation tools in EFL classrooms, focusing on metaphor translation. We investigate the attitudes of first- and third-year English students at the University of Rijeka, Croatia, towards AI tools and evaluate three platforms: Google Translate, ChatGPT, and Glosbe, regarding their ability to accurately translate metaphors. The findings show a generally positive student disposition towards AI tools but also highlight frequent inaccuracies in AI-generated metaphor translations. We discuss the implications of these results for EFL teaching, emphasizing the potential value of error correction as a pedagogical tool. Our analysis suggests that the limitations of AI tools can serve as valuable pedagogical resources for fostering critical engagement, improving students’ understanding of culturally and contextually impregnated language, and enhancing their linguistic skills. Our findings underscore the need for an urgent and systematic integration of AI tools into classrooms. Keywords: AI in education, machine translation, metaphors, error correction, English as a Foreign Language (EFL) Prevajanje (metafor) v dobi umetne inteligence: priložnosti, izzivi in posledice za učilnico angleščine kot tujega jezika IZVLEČEK V članku raziskujemo uporabo prevajalskih orodij, ki temeljijo na uporabi umetne inteligence (UI), pri pouku angleščine kot tujega jezika (EFL) s poudarkom na prevajanju metafor. Preučujemo stališča študentov in študentk prvega in tretjega letnika angleščine na Univerzi na Reki na Hrvaškem do orodij UI ter ocenimo zmožnosti natančnega prevajanja metafor na platformah Google Translate, ChatGPT in Glosbe. Ugotovitve kažejo na splošno pozitiven odnos študentov in študentk do uporabe orodij UI, a hkrati v raziskavi izstopa tudi pogosta netočnost pri prevodnih metaforah, ki jih prevede UI. Razpravljamo o pedagoških posledicah teh rezultatov, pri čemer poudarjamo didaktični potencial popravljanja napak kot učnega pristopa. Ugotavljamo tudi, da omejitve orodij UI lahko predstavljajo dragocena izhodišča pri pouku, saj spodbujajo kritično razmišljanje, pomagajo pri razumevanju kulturno in kontekstualno zaznamovanega jezika ter prispevajo k izboljšanju jezikovnih spretnosti. Ugotovitve podpirajo nujnost sistematičnega vključevanja orodij UI v pedagoški proces. Ključne besede: umetna inteligenca v izobraževanju, strojno prevajanje, metafore, popravljanje napak, angleščina kot tuji jezik (EFL) 186 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ... 1 Introduction Contemporary science in general, and cognitive linguistics in particular, embrace the view that human experience – and, by extension, human language – are profoundly conditioned and shaped by the human body, perception, and culture. More broadly, the relativistic view that language both reflects the conceptual structure of the speaker and simultaneously influences cognitive processes to some degree has been well-established and validated within the field (Whorf 1956; Lakoff and Johnson 1980). According to this perspective, language not only mirrors human thought but also actively contributes to shaping it.1 In the specific context of how language reflects and influences cognition, metaphors are often regarded as pivotal elements that guide thought processes, serving as “switching points” on the rail junctions of our ideas (Lakoff and Johnson 1980). Most fluent speakers of English know that the phrases ‘switching point’ and ‘rail junction’ used in the previous sentence are not to be understood literally – i.e., as referring to a pivotal point at a rail junction where trains can go one direction or another. Instead, these two expressions are expressive devices of language that describe metaphors as having the power to direct our ideas into one among the many possible options that our cognitive, or thought processes can create. In fact, in the sentence under scrutiny, we have just resorted to metaphors to explain what they are. In other words, we have compared the thought process to a train journey, and tried to illustrate the capacity of metaphors to direct, redirect and shape our ideas in a certain way (rather than some other possible one) by comparing metaphors to switching points on rail junctions that can direct the train (i.e., thought) along different paths. Our aim was to render – as clearly as possible – the idea that the thought process can be strongly directed by using metaphors. In fact, as illustrated by our example, metaphors are linguistic tools that make us understand one thing (usually a more abstract one – in our case the cognitive process) in terms of another (usually a simpler, more ‘accessible’ one – in our case the train travelling on rail tracks that can and do go in different directions by being directed at switch-points). If we now try to translate the sentence ‘Metaphors are switching points on the rail junctions of our ideas’ using a few AI tools,2 we get the following: a) ChatGPT: Metafore su “točke prebacivanja” na željezničkim čvorištima naših ideja. (literal translation with inadequate lexical selection of ‘transfer points’ and ‘rail nodes’/’railway crossing’). 1 See, e.g., the work done in the past few decades by the interdisciplinary Language and Cognition Group (now subdivided into multiple groups) of the Max Planck Institute for Psycholinguistics in Nijmegen, on how language influences perception, categorization, and conceptualization. For more information and references, visit https://www. mpi.nl. For a further range of studies on the interplay between linguistic structures and cognitive processes (e.g., the interplay between language and the perception of space and time), see the extensive body of work by Lera Boroditsky (e.g., Boroditsky and Gaby 2010). For recent work, see Maier and Abdel Rahman (2024). 2 These three AI translation tools – Google Translate, ChatGPT, and Glosbe – were chosen because they are the most commonly used tools in Croatia, among both students and professional translators (for more details, see Section 2 below). Google Translate and Glosbe use machine learning algorithms, and ChatGPT relies on advanced language models (like GPT) that involve deep learning. Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ... 187TRANSLATION STUDIES b) Google translate: Metafore su “sklopne točke” na željezničkim raskrižjima naših ideja. ((literal translation with inadequate lexical selection of ‘switch points’ and ‘rail nodes’/’railway crossing’). c) Glosbe: Metafore su prekretnice na željezničkim raskrižjima naših ideja. (literal translation with inadequate lexical selection of ‘turning point / milestone’ and ‘rail nodes’/’railway crossing’). All three translation versions are too literal and prove to be contextually and culturally inappropriate in the target language. In all three cases, the machine translation tool yields an attempt to render lexical accuracy by proposing a literal translation of what is recognized as (railroad) technical terms – namely switching point and rail junction, (rendered into Croatian – in the case of the English ‘switching point’ – as ‘transfer point’ in a), ‘switching point’ in b), and ‘turning point / milestone in c), while all three tools render ‘rail junctions’ as ‘rail node’ / ‘railway crossing’). At the same time, neither of the three translation versions manages to render the pragmatic value of the source language metaphoric expressions, thus failing to convey the message. All three versions have problems with the two metaphors in the source language sentence (metaphors seen as ‘switching points’ and viewing ‘the rail junctions’ of ideas). If we now turn to student translations, we get the following: d) Translation by students:3 Metafore su skretničari na raskrižjima naših misli. (literally, ‘Metaphors are the switchmen at the junctions of our thoughts’). Immediately, we note a stark contrast between the AI-generated translations and the human translation. While the former are overly literal and culturally inappropriate, the student version, characterized by a degree of creative liberty (i.e., a slight departure from the source language), functions effectively within the context and is perfectly adapted to the target language and culture. Building on these observations, and within the broader academic discussion surrounding the use of AI tools by English as a Foreign Language (EFL) students and translators (Gašpaović et al. in prep.), as well as the challenges faced by AI in idiomatic translation (Gašpaović et al. in prep.), in this paper we investigate the possibilities and challenges associated with metaphor translation by AI tools. The issue is explored within the larger, applied context of AI tool usage in the EFL classroom. One of our main aims is to highlight the growing and urgent need to explore and standardize possible applications of AI in the classroom, starting from a detailed understanding of AI and its potential in language learning. The paper is structured as follows: after a brief literature review, we introduce the study, outlining the methodology and results. These results are then discussed, and the implications for AI translation tools in 3 All the AI translations of metaphors under investigation in this study were also analysed by the students who completed the questionnaire, post questionnaire completion, and the translation proposed as ‘translation by the students’ refers to the version that was agreed upon by the students in class as the best translation option. 188 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ... general – specifically in the context of metaphor translation – and their integration into EFL classrooms are addressed. The study concludes with a discussion of the current landscape of AI in pedagogy. 2 Setting the Stage It is indisputable that the world is currently undergoing a profound digital transformation across all sectors. In terms of both speed and scale, this change can be described as tectonic. A central driving force behind this transformation is Artificial Intelligence (AI), defined here as the simulation of human intelligence processes by computer systems (see Healey 2020). The three translation tools under scrutiny in this paper – Google Translate, ChatGPT, and Glosbe – are all powered by information technology (IT) and are thus frequently referred to in the literature as IT translation tools. However, in this study, we refer to them as AI translation tools because our focus is specifically on their capabilities in terms of how they process and generate language – that is, how these systems can perform tasks typically requiring human intelligence, such as language understanding, generation, contextualization, and cultural adaptability. While language processing is one of the most rapidly advancing fields within AI, and AI is increasingly infiltrating various aspects of life, one surprising area where AI adoption appears to be lagging is in language didactics, with the EFL (English as a Foreign Language) classroom serving as a notable example. A considerable number of teachers still have a double-edged sword attitude toward the implementation and integration of AI tools into the foreign language teaching process (this issue is discussed in detail in Section 4 below). Numerous recent studies suggest that the integration of AI tools into second/foreign language teaching should be viewed not merely as a passing trend but as an urgent and growing need (Crompton, Edmett, and Ichaporia 2023; Edmett et al. 2023; Vogt and Flindt 2023). As highlighted in our literature review, while the potential of AI is widely recognized by both students and, to a certain extent, teachers, its application in classroom practice remains limited. One region that seems to be reversing this trend is Asia (Crompton, Edmett, and Ichaporia 2023). Meanwhile, most countries within the European Union are still awaiting national policy guidelines on the issue, with notable exceptions such as Sweden (Musk 2022) and the UK (Edmett et al. 2023). A recent study by Vogt and Flindt (2023) demonstrates that even low-threshold AI tools have been integrated into classroom practice in a limited and hesitant manner, with a general tendency for such tools to be “ignored and excluded from language teaching” (2023, 2). Furthermore, the integration of AI tools into EFL classrooms remains underexplored, despite their clear importance (Crompton, Edmett, and Ichaporia 2023; Vogt and Flindt 2023). The above implies that we are neglecting the vast potential applications of AI tools in foreign language classrooms. AI-supported tools, particularly translation tools, can assist students to improve their language skills by offering instant translations, exposing them to diverse language structures, and providing immediate feedback (Dizon and Gayed 2021; Farrokhnia et al. 2023; Schmidt and Strasser 2022). Moreover, AI tools can provide personalized learning 189TRANSLATION STUDIES experiences, adapting to the individual needs of students and offering resources tailored to their specific learning levels (Okolo et al. 2024). AI chatbots like ChatGPT, for instance, can engage students in conversational practice, enhancing their fluency and comprehension (Crompton, Edmett, and Ichaporia 2023; Kazu and Kuvvetli 2023). Finally, AI tools can create innovative and stimulating learning environments and contexts, such as through virtual reality (Chen et al. 2022). True, the vast potential of integrating AI into EFL teaching comes with certain limitations. Setting aside the many ethical issues, which are outside the scope of this paper, one of the most significant challenges in integrating AI tools into foreign language pedagogy lies in the pragmatic and culturally embedded nature of language, particularly regarding idiomatic expressions and metaphors. As illustrated in the introduction, AI tools are not equipped to provide the meaningful, context-based, and possibly culture-specific interpretations necessary for appropriately translating or explaining culturally and/or contextually loaded phrases (see also Naveen and Trojovský 2024). These tools often rely on literal translations, overlooking important nuances such as tone, cultural implications, and the pragmatic functions of language in real-world contexts. To reiterate our initial point, metaphors – acting as cognitive and cultural connectors in language – present a significant challenge for AI translation tools. One of the main messages we wish to convey is that this limitation need not be viewed solely in a negative light; instead, it can be regarded as a potentially valuable pedagogical tool. When considering the issue in the context of the critical role of error correction in the EFL classroom (see Khansir and Pakdel 2018), the potential of AI translation tools emerges as far more valuable than initially apparent. These tools are useful pedagogical resources not only when they provide accurate translations, but also when they fail to do so, since these failures can offer powerful opportunities for teaching about culture and context-specific elements, critical evaluation of translations, and overall learning. This is the central argument we will further explore in the discussion below. Before delving into that, in the central part of this study, we will review our research examining student attitudes and habits regarding the use of AI translation tools, as well as the performance of these tools in translating metaphorical language. 3 The Study Given that students, teachers, and translators alike face numerous challenges related to the use of AI translation tools in their everyday work, we decided to explore some of these pressing issues in greater detail and, at the very least, provide a more comprehensive framework for their future investigation. The study presented in this paper is motivated by the following questions: 1. What are the most widely used translation AI tools among Croatian students of English? 2. Which opportunities and challenges do students recognize regarding the use of these tools for translation purposes? 3. Are students actively encouraged to use AI tools in their translation work, and/or provided adequate guidance in this respect? 4. What guidance would they give to users of AI translation tools? 190 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ... 5. How do the most widely used AI tools cope in the translation of metaphors? 6. What are the implications of all the above for EFL classrooms? The first four questions were investigated using a questionnaire. To address the fifth question, twenty metaphorical expressions – both cross-linguistically transparent and opaque – were run through the most popular AI translation tools (Google Translate, ChatGPT, and Glosbe), and their English-to-Croatian and Croatian-to-English translations were evaluated in terms of accuracy and cultural appropriateness. The final, sixth question is explored in the discussion section of the paper, drawing on the findings from the first five research questions. 3.1 Methodology This study employed a mixed-methods approach, integrating both qualitative and quantitative techniques. The initial stage involved a comprehensive literature review, focusing on existing scholarly work related to the implementation of AI tools in EFL classrooms in general, and specifically for the purposes of metaphor translation. The goal of this stage was not only to synthesize existing scholarly discussions but also to assess the level of awareness and identify current areas of interest. This review served as both a stimulus and a foundation for the subsequent phases of the research. As noted in the previous section, this phase highlighted the fact that the scholarly awareness regarding the need for and ways of integrating AI into EFL classrooms, particularly in metaphor translation, is extremely limited and insufficient. The next phase involved administering a mixed-format questionnaire (see Appendix), which included both closed and open-ended questions. This questionnaire was used to collect primary data and assess the current situation regarding the use of AI tools by students in their everyday activities, both in and outside the classroom. The survey was completed by seventy- two university students enrolled in the undergraduate English program at the University of Rijeka, from two different years of study: forty-four students were in their first year, and twenty-eight were in their third year. The closed-ended questions aimed to gather quantitative data on the frequency of AI tool usage, exposure to these tools, and preferences for specific AI translation tools. The open-ended questions were designed to collect qualitative responses regarding participants’ views on various aspects of AI translation tools in the context of EFL teaching and learning. The final stage of the study involved a controlled experiment in which a set of metaphors, ranging from cross-linguistically transparent (i.e., having lexical and pragmatic translational equivalents) to cross-linguistically opaque (no straight one-to-one translational matches), were translated from English to Croatian and vice-versa using several AI translation tools (Google Translate, ChatGPT, and Glosbe). The translations were evaluated one by one based on accuracy, fidelity to the source meaning, and cultural appropriateness of the target expression. The results of these translations were subsequently discussed in class. 3.2 Results The results are presented below in the order of their appearance on the questionnaire. The responses to the quantitative questions are reported and displayed in graphs, while the 191TRANSLATION STUDIES answers to the qualitative questions are organized and presented based on the frequency of recurring response themes. 3.2.1 Use of AI Translation Tools (Question 1) Both first-year and third-year students reported using AI translation tools in their everyday life and academic work. As a result, the answer to Question 1, which asked whether students used AI translation tools, was 100% “yes” in both groups. 3.2.2 Most Widely Used AI Translation Tools (Question 2) When asked about the most widely used AI translation tools among Croatian students of English at the University of Rijeka, the findings revealed significant differences between first- year and third-year students in their software preferences. • First-Year Students: Among the first-year students (n = 44), Google Translate emerged as the overwhelmingly preferred translation tool, with 42 students reporting its use. Other tools, such as ChatGPT (8), Glosbe (5), and DeepL (2), were used to a much lesser extent. • Third-Year Students: In contrast, third-year students (n = 28) favoured Glosbe (24), with Google Translate (17) as the second most popular option. Additionally, online dictionaries (7), DeepL (6), and ChatGPT (4) were also commonly used, while Eudict and Reverso each had one user. We note that the total number of responses for this question exceeds the number of students who completed the questionnaire. This is because many students reported using multiple translation tools, with this behaviour being more prevalent among third-year students. Table 1. Preferred translation tools. Translation Tool First-Year Students (n = 44) Third-Year Students (n = 28) Google Translate 42 17 Glosbe 5 24 ChatGPT 8 4 DeepL 2 6 Online Dictionaries 0 7 Eudict 0 1 Reverso 0 1 3.2.3 Guidance on AI Translation Tools (Questions 3 and 4) The responses to Questions 3 and 4 reveal a notable contrast between first-year and third-year students regarding guidance on AI translation tools. • First-Year Students: The majority of first-year students (n=44) reported that they had not received any guidance on the availability or use of AI translation tools during the 192 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ... first three months of their university education, or prior to that.4 This lack of guidance was reflected in their negative responses. • Third-Year Students: In contrast, the third-year students (n = 28) were divided, with 7 students indicating that they had received guidance on AI translation tools at the university level. This was an unexpected finding, but further investigation revealed that all seven of these students had German as their second major. As part of their German courses, they were introduced to Glosbe, which they subsequently shared with their classmates, explaining the high popularity of Glosbe among third-year students. 3.2.4 Benefits of AI Translation Tools (Question 5) Regarding the benefits of using AI translation tools, students from both first and third years highlighted similar advantages, with speed and the multiplicity of options being the most frequently mentioned. Other noted benefits include the following: • Simplicity and ease of use • The availability of alternatives, allowing users to choose the best solution • Tools that sometimes remind users of the appropriate translation • Vocabulary expansion • Potential as a learning tool Since the advantages identified by first-year and third-year students did not differ significantly, the results have been combined into one graph below. 3.2.5 Challenges in Using AI Translation Tools (Question 6) When it comes to challenges that may hinder the use of AI translation tools, the respondents were almost unanimous in identifying accuracy and issues related to (de)contextualization as major concerns. Specific challenges mentioned included the following: • Problems with idioms and metaphors • Issues with collocations • Difficulty handling cultural allusions • Struggles with the pragmatic aspects of language, such as colloquial expressions, proverbs, and personal perspectives Additionally, some students mentioned less common challenges, such as the following: • Problems with syntax • Insufficient data for Croatian, resulting in poorer performance when translating from or into Croatian, compared to “major” languages 4 The questionnaire was administered in late December 2024, by which time first-year students had completed three months of lectures, as the academic year begins in October. In terms of guidance on the use of AI translation tools, respondents were instructed to consider any form of guidance they had received, including both during their time at university and prior to enrolment, at any level of education. 193TRANSLATION STUDIES • A lack of habit in using physical dictionaries and other literature • A potential narrowing of creativity and autonomy owing to reliance on IT tools No significant differences were noted between the first- and the third-year respondents. 3.2.6 Use of IT Tools for Translating Metaphors (Question 7) Regarding whether students would use IT tools for translating metaphors, the responses revealed a clear ‘no’ from the first-year students, and a more divided response from the third- year students. • First-Year Students: All 44 first-year students (100%) stated that they would not use IT tools for translating metaphors. • Third-Year Students: Among the 28 third-year students, 21 (75%) preferred not to use IT tools for translating metaphors, while 7 (25%) were in favour of using them. It is important to note that while first-year students simply answered ‘no’ to the question of using IT tools for metaphor translation, third-year students often elaborated on both their ‘yes’ and ‘no’ responses. Positive responses were frequently qualified with phrases such as “Yes, but…” or “Yes, if…”. Common elaborations included the following: • “Yes, but with caution/guidance/care/a pinch of salt.” • “Yes, if I cannot think of an equivalent myself / if I am unfamiliar with the source language expression.” • “Yes, but always double-checking the proposed translation with authentic target language data, teachers, or native speakers.” On the other hand, most negative responses were followed by explanations that reiterated the limitations of AI translation tools, particularly when dealing with the idiomatic, metaphoric, pragmatic, and cultural aspects of translation – issues already highlighted in the responses to Question 6 (above). 3.2.7 Guidance for Using AI Translation Tools as Future EFL Teachers (Question 8) Finally, when it comes to the guidance that first- and third-year students, as future English as a Foreign Language (EFL) teachers, would give to their students regarding the use of AI translation tools, the responses were grouped based on common themes, which were then ranked according to their frequency. The most frequent themes identified are listed below, from most to least common: First-Year Students: • Use IT tools as a help rather than relying exclusively on them • Always double-check the solutions proposed by IT tools, as they can make many mistakes 194 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ... • Compare different sources • First, study and understand the language, and only then use IT tools • Use with great caution or moderation; do not overuse them (they will make you “dumb”) Responses that were unique or only appeared once include: • Do not use for sentences, only for words • Do not use them, as they are above your level • Use them because they can help you better understand a text in a foreign language • Use them only as an exercise in recognizing mistakes commonly made by others Third-Year Students: • Use IT tools, but do not abuse them; they are just a tool, a help, support • Use more than one IT tool and check the different solutions; triple fact-check; compare and contrast different solutions; always remember that these tools make mistakes all the time • Do not use IT tools if your knowledge is poor • Learn to think critically about the proposed translations • First, learn to work without their help • I would discourage their use; use as a last resort • If you get paid for a translation, you must do it yourself – anyone can feed a source language into a tool, but that is not the point • Best for specific/technical vocabulary We immediately observe that the responses from third-year students reveal more pedagogically oriented comments compared to those from first-year students. While first-year students mainly focus on cautionary advice, such as using the tools sparingly and cross-checking results, third-year students demonstrate a deeper understanding of the broader educational implications of IT tool usage. They emphasize critical thinking, the importance of building foundational knowledge before relying on tools, and the need to use these resources as a last resort or for specific tasks. This shift toward more pedagogically sound advice reflects their growing awareness of the role of an EFL teacher and their understanding of how to guide students effectively in the classroom. This topic will be discussed in more detail in the next section. 3.2.8 Accuracy of AI Translation Tools in Translating Metaphors In order to explore the level of accuracy demonstrated by a selected set of AI tools in translating metaphors, we created two sets of metaphorical expressions: one containing 10 cross-linguistically transparent metaphoric expressions (i.e., metaphorical phrases that have lexical and pragmatic equivalents in both languages), and the other set containing 10 cross- linguistically opaque metaphoric expressions (i.e., metaphorical phrases with no direct one- to-one translational equivalents). These two sets are exemplified below. 195TRANSLATION STUDIES Cross-linguistically transparent metaphorical phrases: 1. A double-edged sword – Mač s dvije oštrice (meaning: something that has both positive and negative consequences or effects); 2. The tip of the iceberg – Vrh ledenog brijega (meaning: what is visible or known is just a fraction of the whole); 3. A wolf in sheep’s clothing – Vuk u janjećoj koži5 (lit. ‘Wolf in lamb’s skin’ – meaning: someone who appears to be harmless or trustworthy is actually dangerous and deceitful) 4. To be on cloud nine – Biti na devetom nebu (meaning: extremely happy or in a state of bliss). Cross-linguistically opaque metaphorical phrases: 1. A silver lining – U svakom zlu neko dobro (lit. ‘In every evil, there is some good’) (meaning: a positive or hopeful aspect to a generally negative or difficult situation) 2. Caught between a rock and a hard place – Između čekića i nakovnja (lit. ‘Between the hammer and the anvil’) (meaning: be caught in a position where one is pressed from two sides, with no easy way out); 3. Like two peas in a pod – Kao dvije kapi vode (lit. ‘Like two drops of water’) (meaning: two things or people that are extremely similar or identical) 4. M iss the boat / the ship has sailed – prošla baba s kolačima (lit. ‘the grandmother with the cakes has passed/left’) / Prošao voz (literally, ‘the train has passed’) (meaning: an opportunity has been missed, and it is too late to act now) All expressions from both sets were translated from English to Croatian and vice-versa using three AI translation tools: Google Translate, ChatGPT, and Glosbe. All translations were first evaluated by the teacher for accuracy, fidelity to the source meaning, and cultural appropriateness of the target expression. The results of these translations were subsequently discussed with the students in class.6 The results we obtained when translating our two sets of metaphors align closely with the findings presented by Gašparović, Brala-Vukanović, and Brkić-Bakarić (in prep). In fact, the accuracy of translations of expressions that contain metaphorical language (either metaphorical descriptions or conventional metaphors embedded in idioms, proverbs, etc.) seems to be influenced by three key factors: a) the degree of equivalence between source and target metaphoric expressions; b) the source and target language pair; and c) the translation platform. More specifically, in terms of platform performance, ChatGPT demonstrates a higher level of translation accuracy compared to Google Translate, which, in turn, outperforms Glosbe. It 5 To give a good example of overliteral translation, let us note here that albeit both English and Croatian use this idiom, and the metaphorical mapping between good and evil is clear and frequently used, ChatGPT translates the English expression ‘wolf in sheep’s clothing’ as ‘wolf in sheep’s skin’, mistaking the Croatian culturally usual ‘lamb’s skin’ with the literal translation from English, i.e. ‘sheep’. 6 The evaluations of the translations in class were done after the students had completed the questionnaire. 196 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ... is worth noting that the study by Gašparović, Brala-Vukanović, and Brkić-Bakarić (in prep) indicates that Microsoft Copilot surpasses ChatGPT in translation accuracy. However, since our participants did not use Microsoft Copilot, it was not included in this study. Additionally, translation accuracy tends to be higher when translating from Croatian to English rather than from English to Croatian. This is unsurprising, as AI translation tools are trained on large datasets. Translating from a language with moderate data resources, such as Croatian, into a language with extensive data resources, like English, poses less of a challenge for AI systems compared to translating in the reverse direction. Finally, as anticipated, metaphorical expressions with close translational equivalents in the target language were translated more accurately than those without such close (lexical, semantic, and pragmatic) equivalents. Considering the variability in translation quality – from highly accurate and contextually appropriate translations to completely inaccurate or unacceptable ones – the role of the teacher in this process becomes crucial. This issue is discussed in detail in the next section of this paper. 4 Discussion The data presented in Section 3 reveal interesting trends in the use of AI translation tools among Croatian students of English, with notable differences between first-year and third- year students. These differences reflect both the development of various linguistic and pedagogical skills, as well as the evolving relationship students have with these tools as they progress through their studies. The findings emphasize a (potential) critical link between linguistic competence and reliance on IT (translation) tools. Additionally, they underscore the urgent and growing need for more structured guidance on the use of AI translation tools in the EFL classroom. At this point, an important observation is that while the use of AI translation tools is widespread among students, this practice is not always mirrored in teaching practices or reflected in the national curriculum. In Croatia, there seems to be a disconnect between the rapid development of these tools and the pace at which educators are integrating them into the EFL classroom. One plausible explanation for this disconnect lies in the generational gap between students, who are more inclined to use technology in general, and teachers, who may be less familiar with or receptive to these tools. Interestingly, while everyday practice shows that teachers frequently use AI translation tools outside the classroom for their own purposes, our data suggest that they remain reluctant to incorporate these technologies into their teaching practices. This reluctance may stem from concerns about AI tools replacing traditional teaching methods or teachers feeling inadequately prepared to integrate them among their pedagogical tools (Edmett et al. 2023; Vogt and Flindt 2023). Furthermore, this lack of integration of AI tools in language education is at least in part due to the lack of structured guidance or curricular support for AI integration in the EFL classroom. Given students’ natural inclination towards IT, and given that research has demonstrated 197TRANSLATION STUDIES the potential benefits of AI translation tools in language learning, including vocabulary improvement, stylistic refinement, and even anxiety control (Crompton, Edmett, and Ichaporia 2023), lack of structured guidance is undermining the many potential pedagogical benefits these tools could provide for both teachers and students. In particular, deeper analysis and guidance is needed for clearly understanding and describing the tasks that AI tools can handle effectively, how these tasks can be incorporated into the classroom, and what exactly remains the role of the teacher and teacher expertise (for a comprehensive discussion regarding this point, see Edmett et al. 2023). In the context of the current topic, i.e. AI translation of metaphors, understanding how metaphor translation tasks can be incorporated into EFL teaching, and how students and teachers could benefit from this, would allow AI to serve as a useful pedagogical tool. In fact, while our study has shown that AI translation tools may often underperform with metaphorical expressions, teachers should be made aware of the fact that AI’s failures in accuracy should be seen as valuable teaching opportunities. In cases of inaccurate AI translation output (e.g., results too literal or contextually inappropriate), a detailed error correction process can serve not just to improve students’ language skills, but also to foster students’ cultural sensitivity and their ability to critically evaluate AI translation tools and their output from diverse perspectives. The lack of curricular guidance on the use of IT (translation) tools results in their pedagogical integration in Croatia remaining largely informal and individualized. At the same time, as our data show, when guidance is provided, it is generally well received by students. In fact, while first-year students from our study primarily rely on Google Translate as a convenient tool, third-year students exhibit more varied use, with Glosbe emerging as the most popular choice. This shift is likely due to guidance they received in their German classes, highlighting how even a minor introduction to AI tools can significantly influence students’ perceptions and usage. The success of Glosbe among third-year students underscores the importance of structured guidance in AI tool integration, as it can influence how students navigate and evaluate the performance of different translation tools. Moreover, since AI tools develop at varying rates (some showing decline in performance results over time – cf. Gašparović, Brala- Vukanović, and Brkić-Bakarić (in prep)), it is crucial that students are taught to critically assess and reassess these tools continuously. Our data also shows an interesting evolution in students’ critical attitudes toward AI tools as they advance from the first to the third year, even without structured, formal pedagogical guidance in this respect. While first-year students often approach AI tools with caution, third- year students show a more nuanced understanding of these tools’ limitations, especially in the translation of culturally specific content, such as metaphors.7 This progression seems to reflect their trial-and-error experiences with translation tools, and only sporadic, informal guidance from individual teachers. The third-year students, despite lacking formal pedagogical training, demonstrate a critical approach, acknowledging the inherent limitations of these tools and the necessity of double-checking translations. In this regard, integrating cultural awareness training into the use of AI translation tools, through error analysis and correction of Idioms 7 For an interesting study on how students working in groups negotiate limitations in the use of Google Translate, see Rowe (2022). 198 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ... and metaphors, for example, constitutes a step in the right direction. The current situation, where the development of critical thinking and critical evaluation skills regarding these tools is left to the individual trial-and-error experiences of students, and their intuitive awareness, is certainly not the most effective pedagogical method. Given all the above, it would be fair to say that AI translation tools can undoubtedly be useful for basic translation tasks, and students should be encouraged to use them as a complement to their own language skills, rather than a substitute. Instead of simply proposing IT translation tools as potential help, teachers should actively guide students in the practical application of these tools, crucially focusing on their potential to be used as excellent – and de-personalized – error-correction pedagogical opportunities. Notably, the de-personalized nature of AI’s inadequate or inaccurate translations eliminates the personal failure moments that typically occur in a real classroom, thus removing the potential for demotivating students or exposing them to negative and stressful emotions. This aspect is a crucial factor that should not be overlooked when analysing the future role of AI (translation) tools in the EFL classroom. Providing opportunities for students to critically assess the quality of (a variety of ) translations, especially when dealing with idiomatic or metaphorical language, can help improve not only their linguistic and translation skills, but also their wider understanding of cultural and contextual nuances of language and communication. Error correction as a pedagogical tool has a valid ally in IT. While AI-driven tools can and do make mistakes, the human ability to critically analyse and correct these mistakes, as well as to learn from them, is an efficient way for deeply and productively engaging with language and the translation process in the context of AI tools. 5 Concluding Remarks In light of the discussion above, it is evident that the integration of AI translation tools into the EFL curriculum is essential and urgently needed. This integration would not only reflect students’ current habits, interests, and needs, but also allow for a more structured and effective use of these tools in language learning. Several key points need to be addressed in this regard: 1) the structured inclusion of AI tools in the curriculum and teacher training on their pedagogical use; 2) continuous monitoring of the development of AI translation tools; and 3) fostering critical thinking in students, particularly in relation to translating culturally specific language. While AI translation tools offer both opportunities and challenges, their current lack of integration in the classroom limits their potential. By fostering critical thinking and promoting a balanced use of these tools, educators can enhance students’ translation abilities, especially when dealing with the complexities of culturally embedded language, such as metaphors. Even though AI translation tools often struggle with culture-specific expressions, these shortcomings provide excellent opportunities for error-correction-based pedagogy. As AI technology continues to evolve, it is crucial that educators – and before them education policy makers – adopt a proactive, structured approach to integrating these tools into the curriculum. The future of translation and language teaching is inevitably intertwined with AI, and it is crucial that both teachers and students are equipped to 199TRANSLATION STUDIES navigate this evolving landscape and use it in an informed and guided way, so as to make full use of their potential and avoid possible traps.8 Further research is needed to explore how AI translation tools can be more effectively integrated into language learning curricula. Given their rapid development, investigating just how exactly AI tools can be tailored to address specific challenges – such as translating metaphors and other culturally embedded language features – will be crucial. Understanding which pedagogical tasks are best suited for AI and which should remain the (sole) responsibility of the teacher will provide valuable insights into how educators can best prepare students for the challenges and opportunities AI tools present. Ultimately, the gap between AI’s potential and its actual integration in EFL classrooms is not only a technological issue but also a pedagogical one. Future research and teacher training should address both the practical and theoretical aspects of AI tool use in language education. Training programs should not only focus on how to use these tools effectively but also emphasize fostering students’ critical engagement with them. Teaching students to understand the limitations of AI tools and using these limitations as a pedagogical opportunity will enhance their language skills and ensure that AI becomes a valuable, guided resource in the EFL classroom. In conclusion, while many of the issues discussed remain at an intuitive level, their importance and relevance demand immediate scholarly attention and structured curricular guidance. The time to act is now – further delays in integrating AI translation tools into EFL teaching would mean missing a critical opportunity to significantly enhance language learning and teaching in the digital age. References Boroditsky, Lera, and Alice Gaby. 2010. “Remembrances of times east: Absolute spatial representations of time in an Australian Aboriginal community.” Psychological Sciences 21 (11): 1635–39. https://doi.org /10.1177/0956797610386621. Crompton, Helen, Adam Edmett, and Neenaz Ichaporia. 2023. Artificial Intelligence and English Language Teaching: A Systematic Literature Review. British Council. Dizon, Gerald, and Jamal M. Gayed. 2021. “Examining the impact of grammarly on the quality of mobile L2 writing.” JALT CALL Journal 17 (2): 74–92. https://doi.org/10.29140/jaltcall.v17n2.336. Edmett, Adam, Neenaz Ichaporia, Helen Crompton, and Ross Crichton. 2023. Artificial Intelligence and English Language Teaching: Preparing for the Future. British Council. https://www.teachingenglish.org .uk/sites/teacheng/files/2024-08/AI_and_ELT_Jul_2024.pdf. Farrokhnia, Mohammad, Sima K. Banihashem, Omid Noroozi, and Allan Wals. 2023. “A SWOT analysis of ChatGPT: Implications for educational practice and research.” Innovations in Education and Teaching International 61 (3): 460–74. https://doi.org/10.1080/14703297.2023.2195846. Gartner, Smiljana, and Marjan Krašna. 2023. “Artificial intelligence in education – ethical framework.” 12th Mediterranean Conference on Embedded Computing (MECO), Budva, Montenegro, 1–7. https:// doi.org/10.1109/MECO58584.2023.10155012. 8 Among the many challenges discussed in this study, one has remained outside its primary focus – namely, the ethical considerations. Although the present research does not explicitly address the ethical dimensions of incorporating AI technologies into the classroom, this exclusion should not be interpreted as suggesting that the issues explored here can be considered in isolation from the significant ethical concerns that accompany them. For a comprehensive and recent discussion of ethical considerations related to the integration of AI tools into pedagogical processes, see Gartner and Krašna (2023). 200 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ... Gašparović, Marijana, Marija Brala-Vukanović, and Marija Brkić-Bakarić. in prep. “Idioms and machine translation: Assessing translation accuracy in the age of AI.” Paper submitted to CALICO Journal. Healey, Justin. 2020. Artificial Intelligence. Volume 450. The Spinney Press. Kazu, Ibrahim Yasar, and Murat Kuvvetli. 2023. “The influence of pronunciation education via artificial intelligence technology on vocabulary acquisition in learning English.” International Journal of Psychology and Education Studies 10 (2): 480–93. https://ijpes.com/index.php/ijpes/article/view/1044. Khansir, Ali A., and Farhad Pakdel. 2018. “Place of error correction in English language teaching.” Educational Process: International Journal 7 (3): 189–99. https://doi.org/10.22521/edupij.2018.73.3. Lakoff, George, and Mark Johnson. 1980. Metaphors We Live By. University of Chicago Press. Maier, Martin, and Rasha Abdel Rahman. 2024. “Transient and long-term linguistic influences on visual perception: Shifting brain dynamics with memory consolidation.” Language Learning 74 (1): 157– 84. https://doi.org/10.1111/lang.12631. Musk, Nicholas. 2022. “Using online translation tools in computer-assisted collaborative EFL writing.” Classroom Discourse 13 (2): 119–44. https://doi.org/10.1080/19463014.2021.2025119. Naveen, Palanichamy, and Pavel Trojovský. 2024. “Overview and challenges of machine translation for contextually appropriate translations.” iScience 27 (10): 110878. https://doi.org/10.1016/j.isci.2024 .110878. Okolo, Chinwe Jane, Chinyere Grace Ezeonwumelu, Chioma Ihuoma Barah, and Ugwu Nnenna Jovita. 2024. “Language education in the age of AI: Opportunities and challenges.” Newport International Journal of Research in Education 4 (1): 39–44. https://doi.org/10.59298/NIJRE/2024/41139448. Rowe, Lindsey W. 2022. “Google Translate and biliterate composing: Second‐graders’ use of digital translation tools to support bilingual writing.” TESOL Quarterly 56 (3): 883–906. https://doi.org /10.1002/tesq.3143. Schmidt, Thomas, and Thomas Strasser. 2022. “Artificial intelligence in foreign language learning and teaching: A CALL for intelligent practice.” International Journal of English Studies 33 (1): 165–84. Vogt, Kerstin A., and Lars Flindt. 2023. “Artificial intelligence and the future of language teacher education: A critical review of the use of AI tools in the foreign language classroom.” In The Future of Teacher Education: Innovations Across Pedagogies, Technologies, and Societies, edited by P. Hohaus and J.-F. Heeren, 179–99. Brill. Whorf, Benjamin Lee. 1956. Language, Thought, and Reality: Selected Writings of Benjamin Lee Whorf. Edited by John B. Carroll. MIT Press. 201TRANSLATION STUDIES APPENDIX – Questionnaire Administered to Students 1. Do you use IT/computer-assisted translation tools in your work? 2. If yes, which AI translation tool do you use most frequently? 3. Have you ever been exposed to AI translation tools in your English classes? 4. Have you received any guidance from your university lecturers regarding the use of AI translation tools? 5. In your view, what are the benefits of using AI translation tools? 6. In your view, what are the limitations of using AI translation tools? 7. Do you rely on IT tools for the translation of metaphors? 8. In your view, how efficient are IT tools when translating metaphors? 9. As future EFL teachers, what guidelines would you give your students regarding the use of AI translation tools? 203TRANSLATION STUDIES Ghodrat Hassani, Marziyeh Malekshahi, Hossein Davari Damghan University, Iran 2025, Vol. 22 (1), 203-221(228) journals.uni-lj.si/elope https://doi.org/10.4312/elope.22.1.203-221 UDC: [81’25:004.89]:659.1(55) AI-Powered Transcreation in Global Marketing: Insights from Iran ABSTRACT This study examines AI-powered transcreation’s role in improving cross-cultural brand communication. We employed GPT-3 to evaluate AI’s ability to enhance global marketing through improved translation and adaptation of brand messages. Traditional translation methods often fail to capture brand-specific emotional resonance across cultures, but AI tools may address this challenge. Our research compared 10 translation students and 10 professional translators in translating/transcreating brand taglines from Persian to English. An initial test without AI showed professionals outperforming students. After six weeks of GPT-3 training, however, students surpassed professionals, as judged by expert raters using standardized criteria. The findings indicate that targeted AI training can improve transcreation quality. The study also underscores the value of human judgment in crafting prompts and choosing optimal AI outputs. These results also offer insights for translation education, professional training, and global marketing strategies. Keywords: copywriting, GPT-3, large language model (LLM), marketing translation, transcreation Transkreacija z umetno inteligenco v globalnem marketingu: spoznanja iz Irana IZVLEČEK Študija ugotavlja, kako lahko transkreacija (oz. preustvaritev), podprta z umetno inteligenco, izboljša medkulturno komuniciranje blagovne znamke. Z GPT-3 smo ovrednotili zmožnost UI, da s pomočjo izpopolnjenega prevoda in priredbo oglasnih sloganov okrepi trženje blagovne znamke. Klasični prevajalski pristopi pogosto ne zajamejo kulturno specifične čustvene note posamezne blagovne znamke, medtem ko se orodja UI s tem izzivom lahko spoprimejo. V raziskavi je 10 študentov in študentk prevajanja in 10 profesionalnih prevajalcev in prevajalk prevajalo/transkreativno prilagajalo perzijske oglasne slogane v angleščino. V začetnem preizkusu brez pomoči UI so bili profesionalni prevajalci in prevajalke uspešnejši, po šesttedenskem usposabljanju za delo z GPT-3 pa so študenti in študentke po oceni strokovne komisije, ki je upoštevala standardizirana merila, prehiteli profesionalce. Rezultati kažejo, da ciljno usposabljanje za delo z UI izboljša kakovost transkreacije. Študija kaže tudi na pomen človeške presoje pri oblikovanju napotkov in izbiri optimalnih odgovorov UI ter nudi tudi nove vpoglede za izobraževanje prevajalcev in prevajalk, strokovno usposabljanje in globalne tržne strategije. Ključne besede: pisanje oglasnih besedil, GPT-3, velik jezikovni model (LLM), tržno prevajanje, transkreacija 204 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran 1 Introduction Global brands face growing challenges in cross-cultural marketing communication, needing more than linguistic translation to ensure brand messages resonate across diverse markets while retaining their core identity. Transcreation, a specialized translation process, adapts marketing content to align with cultural contexts while preserving emotional impact (Díaz-Millón 2021, 159; Díaz-Millón and Olvera-Lobo 2021, 354). However, traditional translation methods often fail to convey brand-specific emotional resonance across cultures, posing a significant obstacle for global marketing. Skilled human translators can address this through careful cultural adaptation, but the process is time-consuming and inconsistent. This challenge has intensified with the rising demand for multilingual content across platforms, from technical documentation to social media (Nimdzi Insights 2022). Large language models (LLMs) like GPT-3 offer a potential solution, with advanced capabilities in generating and adapting natural language across languages. Yet, their application to transcreation in marketing remains underexplored, especially regarding how AI tools might improve translator performance in cross-cultural brand communication. This study fills this gap by examining AI-powered transcreation’s role in enhancing cross-cultural brand messaging, focusing on translations from Persian to English for English-speaking North American audiences. We investigate whether GPT-3-assisted tools can improve the shift from mere translation to effective transcreation of marketing content. The research compares translation students and professional translators in translating/transcreating brand taglines from Persian to English, first without AI support and then after having provided students with targeted training in GPT-3-powered tools. This study’s significance lies in its insights into how AI technologies could reshape translation workflows, particularly in marketing, where cultural nuance is critical. As brands aim to engage diverse global markets, effective AI-assisted transcreation methods could improve cross-cultural communication, potentially lowering costs and increasing efficiency. 2 Translation in an Automated Age Advanced translation technologies, such as neural machine translation, have raised fears that human translators may become obsolete as automation disrupts the industry (Cronin 2013). Recent scholarship, however, tempers this view. Moorkens (2020) points out ongoing limitations in machine translation, while Pielmeier and O’Mara (2020) highlight new hybrid roles where translators work alongside AI. Although the long-term impact is unclear, translators must adapt to remain relevant in a technology-driven field. AI’s expanding role in areas like legal, medical, marketing, and technical translation makes resisting this technological shift increasingly impractical (Cronin 2013; Łukasik 2024). This change offers benefits, including improved productivity, new market opportunities, and specialized roles, but also presents challenges such as pricing pressures, shifting skill requirements, job security concerns, and the need for continuous training (Olohan 2017). As businesses aim to connect with global audiences, the rising demand for multilingual content – covering technical manuals, instructional documents, marketing materials, and G. Hassani, M. Malekshahi, H. Davari 205TRANSLATION STUDIES social media – emphasizes the importance of culturally tailored communication for effective global branding (Way 2020). AI significantly contributes to this field, with the AI language translation market expected to reach USD 7.16 billion by 2029, growing at a 25% CAGR, by providing fast, cost-effective, and scalable translation solutions to meet these global needs (The Business Research Company 2025). Traditional translation methods, often slow and resource-heavy, are being transformed by AI, particularly generative tools powered by LLMs. These systems speed up content drafting and adapt tone, style, and language to specific needs (Nimdzi Insights 2024). Leading LLMs, such as OpenAI’s GPT, Google’s Gemini, Meta’s Llama, xAI’s Grok, Cohere, Mistral, and Anthropic’s Claude, support over two dozen languages, enabling simultaneous content generation in multiple languages from a single input (Nimdzi Insights 2023). The integration of AI, particularly LLMs, streamlines multilingual content production, enabling businesses to communicate more efficiently with global audiences. In 2024, the Nimdzi 100 report highlighted that 67% of language service providers (LSPs) utilized generic, out-of-the-box AI solutions, such as ChatGPT, while 55% integrated LLMs via APIs into their workflows, significantly enhancing the speed and scalability of AI-generated translations (Nimdzi Insights 2025, 72). The rise of global brands has increased demand for transcreation, which adapts branded messages to resonate emotionally across cultures (Torresi 2010). Unlike direct translation, transcreation requires expertise in international branding, cross-cultural psychology, and search engine optimization (Mitchell-Schuitevoerder 2020). This need is especially evident in digital spaces, where social media enables brands to engage international audiences directly. Effective localization shapes consumer perceptions and purchasing decisions, prompting translation firms to offer multilingual copywriting for digital platforms (Nimdzi Insights 2022). Although many translators lack advanced marketing expertise, AI tools like GPT-3 offer a solution. By processing large datasets of marketing content, these models generate localized suggestions that align with target audiences’ emotional expectations. Human translators provide essential oversight, refining AI outputs, ensuring cultural accuracy, and enhancing textual precision. This collaborative approach improves marketing content quality while reducing the need for translators to have extensive marketing knowledge. 3 Transcreation for Marketing Translation Transcreation goes beyond traditional translation by creatively adapting marketing messages to resonate deeply across cultural boundaries while adhering to legal mandates, such as France’s Toubon Law requiring French in commercial communications. Unlike conventional translation, which seeks to preserve the original message, transcreation reinterprets its essence to suit the target audience’s linguistic and cultural context, maintaining its emotional impact (Díaz-Millón 2021, 159; Díaz-Millón and Olvera-Lobo 2021, 354). Katan (2016, 377) describes this as a “transcreational turn” in translation, emphasizing re-creation in fields like advertising and localization. Katan (2018) further distinguishes transcreation from translation, presenting translators as creators, especially in culturally sensitive contexts. Additionally, Katan and Taibi (2021) frame transcreation within cultural mediation, offering insights into 206 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran its theoretical and practical roles in translation. This perspective builds on Katan’s (2013) exploration of intercultural mediation, which underscores the translator’s role in bridging cultural gaps to facilitate effective communication in diverse settings. This flexible process allows skilled translators to demonstrate creativity by reframing messages to engage local audiences effectively. Transcreation aims to produce content with equivalent impact and emotional connection in another language, even if the text diverges significantly from the original (Bowker 2023, 129). However, this assumption of equivalence raises questions: Can emotional impact be fully replicated across cultural and linguistic divides? Katan (2001) highlights the importance of intercultural competence in ensuring that such adaptations respect cultural differences without compromising the message’s intent, suggesting that effective transcreation requires a nuanced understanding of cultural dynamics. Metrics for assessing equivalence remain subjective and underexplored. Although practitioners may assert emotional parity, empirical studies on cross-cultural audience responses are scarce, challenging the notion of achieving identical emotional resonance in diverse contexts. For instance, Nike’s global campaign in France did not directly translate its slogan Just Do It. Instead, it transcreated it as Fais-le (“Do it”), complying with France’s Toubon Law (Law 94-665 of 1994). While retaining the original’s call to action, Fais-le is more concise and commanding, omitting the “just” qualifier. This creates a bolder, more urgent tone deemed better suited to French cultural preferences, enhancing its motivational impact. However, the nuance of “just,” which softens the encouragement in English, is lost, making Fais-le more directive. Achieving this balance between creative adaptation and fidelity to the original message is critical, as overly free reinterpretation risks diluting the message’s essence. Determining appropriate boundaries requires careful human judgment. Systemic functional linguistics, particularly appraisal theory (Martin and White 2005), offers tools to analyse attitudinal language, intensity, and audience engagement. Recent studies using these frameworks reveal how transcreation reconstructs emotional resonance and persuasive meanings across languages (Ho 2024). This balance between creative adaptation and fidelity to the original prompts investigation into whether LLMs like GPT-3 can assist translators in generating varied, culturally tailored marketing copy for further refinement and selection. 4 GPT-3: A Generative Transformer Model Within the Broader LLM Landscape GPT-3 (Generative Pre-trained Transformer 3), developed by OpenAI in 2020, marks a significant advance in autoregressive generative language models, a subset of natural language processing (NLP) systems. This neural network produces human-like text across varied contexts, but it is only one approach within the diverse field of LLMs. Other architectures serve distinct purposes: encoder-only models like BERT excel in classification and named entity recognition, while encoder-decoder models like T5 handle both classification and generation. This study focuses on generative models like GPT-3 for their relevance to creative marketing translation tasks. 207TRANSLATION STUDIES At its core, GPT-3 predicts the next token or sequence based on patterns in its training data, enabling coherent and contextually appropriate text generation (Brown et al. 2020). Unlike other LLM architectures, this predictive capability suits tasks requiring creative output. However, open-ended text generation increases the risk of hallucinations – plausible but incorrect outputs – compared to tasks like summarization or rewriting, where source content guides the model (Bender et al. 2021, 610–12). Discriminative models like BERT face different limitations, including their reliance on bidirectional context and unsuitability for generating creative text (Devlin et al. 2019, 4171–72). GPT-3’s transformer architecture processes entire contexts simultaneously via self-attention mechanisms, capturing complex cultural patterns, linguistic nuances, and stylistic elements (Vaswani et al. 2017). This enables culturally sensitive translations that preserve meaning and emotional impact, critical for marketing transcreation. Unlike bidirectional encoder models like BERT or versatile encoder-decoder models like T5, GPT-3 uses only the decoder component in a unidirectional (left-to-right) approach, optimizing it for generation. Pre-trained on vast datasets like Common Crawl, GPT-3 develops robust linguistic skills, including grammar, semantics, and world knowledge, by predicting next words across trillion-word corpora. However, its reliance on data without human-like reasoning can lead to biased, unsafe, or factually incorrect outputs, reflecting stereotypes or misinformation in its training data. Without true understanding, GPT-3 may produce harmful or misleading text, especially in high-stakes contexts, necessitating careful human oversight to mitigate ethical, legal, or safety risks (Tamkin et al. 2023, 4–6). Through fine-tuning or few-shot learning, GPT-3 adapts to new topics or styles with minimal examples, making it ideal for marketing content adaptation. This contrasts with models like BERT, which require explicit fine-tuning per task. OpenAI’s generative engines, such as Davinci, Babbage, and Ada, enhance GPT-3’s capabilities for specific applications (Tingiris 2021, 53). Integration into translation tools, like Matecat with GPT-4 for contextual explanations, highlights their creative potential. Skilled users must guide these tools, balancing their strengths and limitations, particularly for transcreation, where cultural nuance is essential. 5 Research Context and Methodology As language technologies advance rapidly, they promise to transform text production and translation practices. In response, Iran’s undergraduate translation program revised its curriculum in 2017 to focus on emerging technologies, preparing students for a technology- driven industry. The updated curriculum replaced outdated courses with subjects like Translation and Technology, Translation Market, and Emerging Trends in Translation. These courses aim to build skills in using modern tools, understanding technology’s impact on the field, and mastering high-demand areas such as social media, website localization, transcreation, and copywriting. This study investigates whether GPT-3 applications can enhance translation students’ transcreation skills within Iran’s technology-oriented curriculum. The revised national 208 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran curriculum (Ministry of Science, Research and Technology 2022) at the BA level emphasizes localization, marketing translation, and the integration of advanced technologies in coursework, aligning with global trends in translation education that increasingly adopt AI tools (Mellinger 2019; Rodríguez-Castro 2018). While research on AI in translation education exists, studies on LLM applications for transcreation are limited. Daems and Macken (2019) reported improved outcomes from incorporating neural machine translation into training, and Kenny and Doherty (2014) provided frameworks for technology-enhanced pedagogy, which we adapted for our GPT- 3 approach. Our study compares the quality of student translations using GPT-3 as a supplementary tool against those by professional translators without AI support, assessing whether such technologies can help students produce transcreations comparable to those from professionals. 5.1 Participants Ten students in their final year of BA translation studies were chosen from 27 enrolled in the Emerging Trends in Translation course at Damghan University, Iran. These students had completed prerequisite courses in Translation and Technology and Translation Market, covering AI in translation, audience analysis for digital marketing, and the role of content creation in modern translation. At the study’s start, advanced chatbot interfaces like ChatGPT (powered by GPT-3.5) or later LLM-based tools (e.g., GPT-4) were not widely available. Integrating such tools into coursework was impractical, owing to limited access, fixed curricula, and ongoing classes. Switching tools mid-course could disrupt learning. Thus, we opted for specialized GPT-3- based applications tailored to our pedagogical goals for a consistent student experience. Students qualified through a screening exam testing proficiency in prompt1 engineering, output refinement, and translating marketing texts using tools like CopyAI and Yaara. Only 10 of the 27 students achieved the required 80% score. Lack of prior GPT-3 experience was a mandatory criterion to ensure a baseline comparable to that of the professional group. The second group included 10 professional freelance translators. Initially, we sought specialists in transcreation and marketing translation, defined as professionals with at least 50% of their workload in these areas, formal marketing communications training, and 25+ transcreation projects for international brands. Because of recruitment challenges, we broadened the criteria to include translators with at least 5 years of full-time experience on diverse projects, including marketing materials. Screening confirmed all had completed at least 10 marketing translation projects, though such work comprised less than 30% of their portfolios. Professionals were recruited from buyers of Yaademy’s computer-assisted translation (CAT) tool video tutorials, where the lead researcher is a technical consultant and curriculum developer. Tutorial costs were refunded to encourage participation. All professionals reported 1 A prompt refers to the initial text input given to an AI language model that serves as context or instructions for the model to generate a relevant response. 209TRANSLATION STUDIES no prior experience with GPT-3 or related AI tools, a prerequisite for inclusion. Their highest qualifications were bachelor’s or master’s degrees in translation studies or related fields. The sample size of 10 students and 10 professionals is a limitation. A larger sample would improve statistical power and generalizability. Constraints included the intensive GPT-3 training, detailed qualitative evaluations, and difficulty recruiting experienced professionals willing to participate. Future studies should use larger samples to confirm these findings. 5.2 Procedure The study employed a mixed experimental design with a pre-test and post-test translation task, conducted under timed conditions by both the student group and the professional translators. In the pre-test, all participants, native Persian speakers, independently translated three brand motto taglines from Persian to English without GPT-3 access. The study ran from September to November 2022, with a 6-week training and experimentation period from mid-September to late October. Translations targeted English-speaking North American markets, primarily the United States, as specified in the translation brief. This focus reflects the ambition of Iranian companies to expand into Western markets, particularly the U.S., a key branding destination, despite political challenges. The selected companies were DottleBox (an ashtray producer), RareRead (a bookstore), and SharpPoint (a fishing equipment manufacturer). Although the BA students had coursework in content creation and copywriting, neither group had prior GPT-3 training, making the pre-test a baseline of unaided translation ability. Following the pre-test, the student group received 6 weeks of intensive training on GPT-3 tools, including CopyAI, Texta.ai, and Yaara, through two 90-minute weekly sessions led by the researchers. While the researchers lacked formal AI content creation training, one did specialize in technology and translation, staying informed on industry trends through seminars and webinars. Training focused on prompt engineering for marketing text and best- practice guides, with exercises generating outputs like product descriptions, meta descriptions, mottos, and captions. 5.3 Client Specifications The translation brief provided to participants included specific client requirements to ensure the transcreated mottos aligned with marketing goals for North American audiences. Clients specified that mottos should be concise, using fewer than seven words to enhance memorability and recall, critical for effective brand communication. Additionally, mottos were required to reflect the brand’s core identity, be easily understandable, and resonate emotionally with U.S. consumers. These criteria – conciseness, branding representativeness, comprehensibility, and memorability – formed the basis for the evaluation rubric used by expert raters, ensuring translations met the clients’ expectations for culturally tailored, impactful marketing content. 5.4 Experiment Design Of the 27 initial students, only 10 demonstrated sufficient post-training proficiency, assessed via a practical evaluation requiring tailored marketing content creation across three hypothetical brand scenarios using GPT-3 tools. Researcher-designed rubrics evaluated 210 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran prompt engineering, output refinement, and localized translation quality. Students needed an 80% score to qualify for the treatment group, ensuring only those with strong skills influenced post-test outcomes. The professional group received no GPT-3 exposure or training, allowing a direct comparison of specialized AI training’s impact on student performance against professional standards. This design isolated the effect of GPT-3 proficiency on translation quality relative to established expertise. Six weeks after the pre-test, qualified participants completed a post-test translation task designed as a simulated client request. They received background summaries for three companies aiming to adapt Persian motto taglines into memorable English slogans for North American consumers, particularly in the U.S. The companies were Match-Lite, a match producer since 1917, modernizing its motto Of the 27 initial students, only 10 demonstrated sufficient post-training proficiency, assessed via a practical evaluation requiring tailored marketing content creation across three hypothetical brand scenarios using GPT-3 tools. Researcher-designed rubrics evaluated prompt engineering, output refinement, and localized translation quality. Students needed an 80% score to qualify for the treatment group, ensuring only those with strong skills influenced post-test outcomes. The professional group received no GPT-3 exposure or training, allowing a direct comparison of specialized AI training’s impact on student performance against professional standards. This design isolated the effect of GPT-3 proficiency on translation quality relative to established expertise. Six weeks after the pre-test, qualified participants completed a post-test translation task designed as a simulated client request. They rec ived background summaries for three companies aiming to adapt Persian motto taglines into memorable English slogans for North American consumers, particularly in the U.S. The companies were Match-Lite, a match producer since 1917, modernizing its motto “ کنید انتخاب را VidEdu, a youth-focused e-learning ;(Always choose the best) ”بھترین platform with the motto “ساده و صمیمی and GentleTools, a ;(Easy and friendly learning) ”یادگیری toolbox manufacturer emphasizing durability with the motto “ خریدآخرین ابزاری کھ می ” (The last tools you will buy). Briefs included textual brand histories, style guidelines, and visual marketing materials like ads showing product use. (Company names are pseudonyms for privacy.) Participants were instructed to create culturally resonant mottos tailored to U.S. consumer preferences. Students worked independently in a monitored computer lab, using personal resources like online dictionaries, translation memory databases, brand research, style guides, and glossaries. Professionals, geographically dispersed up to 1000 miles apart, completed the task remotely within the same timeframe. To ensure consistency, their activities were tracked via screen recording and timed submission protocols, despite differing settings. A panel of five university translation professors with doctorates and expertise in marketing translation evaluated the translations. With over 60 years of combined experience in cross-cultural psychology, international marketing, copywriting, and advertising, the raters used a standardized rubric assessing four metrics – conciseness, branding representativeness, comprehensibility, and memorability – each scored up to 5 points for a 20-point total, aligning with Iran’s tertiary education standards. A pre-scoring norming session ensured consistent application of criteria through exemplar discussions. To prevent bias, raters were blinded to translator group and GPT-3 use, with mottos presented in random order and identified by number. Inter-rater agreement, measured by Fleiss’ kappa, was 0.79, indicating strong consensus per Landis and Koch (1977). Some disagreement arose over conciseness, reflecting the subjective nature of translation quality assessment, even with structured rubrics (Bayer-Hohenwarter 2011; Hassani 2011; Doherty 2017). Raters also provided qualitative feedback on motto strengths and weaknesses, complementing numeric scores and enriching insights. The evaluat on’s igor – standardized rubr c , expert raters, multi-method assessment, and inter- rater reliability – bolstered the validity and reliability of the translation quality assessment. (Always choose the best); VidEdu, a youth-focused e-learning platform with the motto Of the 27 initial students, only 10 demonstrated sufficient post-training proficiency, assessed via a practical evaluation requiring tailored marketing content creation across three hypothetical brand scenarios using GPT-3 tools. Researcher-designed rubrics evaluated prompt engineering, output refinement, and localized translation quality. Students needed an 80% score to qualify for the treatment group, ensuring only those with strong skills influenced post-test outcomes. The professional group received no GPT-3 exposure or training, allowing a direct comparison of specialized AI training’s impact on stud nt performance against professional standards. This design isolated the effect of GPT-3 proficiency on translation quality relative to established expertise. Six w eks aft r the pre-test, qualified participants completed a post-test translation task designed as a simulated client request. They received background summaries for three companies aiming to adapt Persian motto t glines into m m abl English slogans for North American consumers, particularly in the U.S. The companies were Match-Lite, a match producer since 1917, modernizing its motto “ کنید انتخاب را VidEdu, a youth-focused e-learning ;(Always choose the best) ”بھترین platform with t e motto “ساده و صمیمی and GentleTools, a ;(Easy and friendly learning) ”یادگیری toolbox manufacturer emphasizing durability with the motto “ خریدآخرین ابزاری کھ می ” (The last tools you will buy). Briefs included textual brand histories, style guidelines, and visual marketing materials like ads showing product use. (Company names are pseudonyms for privacy.) Participants were instructed to create culturally resonant mottos tailored to U.S. consumer preferences. Students worked independently in a monitored computer lab, using personal resources like online dictionaries, translation memory databases, brand research, style guides, and glossaries. Professionals, geographically dispersed up to 1000 miles apart, completed the task remotely within the same timeframe. To ensure consistency, their activities were tracked via screen recording and timed submission protocols, despite differing settings. A panel of five university translation professors with doctorates and expertise in marketing translation evaluated the translations. With over 60 years of combined experience in cross-cultural psychology, international marketing, copywriting, and advertising, the raters used a standardized rubric assessing four metrics – conciseness, branding representativeness, comprehensibility, and memorability – each scored up to 5 points for a 20-point total, aligning with Iran’s tertiary education standards. A pre-scoring norming session ensured consistent application of criteria through exemplar discussions. To prevent bias, raters were blinded to translator group and GPT-3 use, with mottos presented in random order and identified by number. Inter-rater agreement, measured by Fleiss’ kappa, was 0.79, indicating strong consensus per Landis and Koch (1977). Some disagreement arose over conciseness, reflecting the subjective nature of translation quality assessment, even with structured rubrics (Bayer-Hohenwarter 2011; Hassani 2011; Doherty 2017). Raters also provided qualitative feedback on motto strengths and weaknesses, complementing numeric scores and enriching insights. The evaluation’s rigor – standardized rubrics, expert raters, multi-method assessment, and inter- rater reliability – bolstered the validity and reliability of the translation quality assessment. (Easy and friendly learning); and GentleTools, a toolbox manufacturer emphasizing durability with the motto Of the 27 initial students, only 10 demonstrated sufficient post-training proficiency, assessed via a practical evaluation requiring tailored marketing content creation across three hypothetical brand scenarios using GPT-3 tools. Researcher-designed rubrics evaluated prompt engineering, output refinement, and localized translation quality. Students needed an 80% score to qualify for the treatment group, ensuring only those with strong skills i flu nced post-test outcomes. The professional group received no GPT-3 exposure or training, allowing a direct comparison of specialized AI training’s impact on student performance against professional standards. This design isolated the effect of GPT-3 proficiency on translation quality relative to established expertise. Six weeks after the pre-test, qualified participants completed a post-test translation task designed as a simulated client request. They received background summaries for three companies aiming to adapt Persian motto taglines into memorable English slogans for North American consumers, particularly in the U.S. The companies were Match-Lite, a match oducer since 1917, modernizing its motto “ کنید انتخاب را VidEdu, a youth-focused e-learning ;(Always choose the best) ”بھترین platform with the motto “ساده و صمیمی and GentleTools, a ;(Easy and friendly learning) ”یادگیری toolbox manufacturer emphasizing durability with the mott “ خریدآخرین ابزاری کھ می ” (The last tools you will buy). Briefs included textual brand histories, style guidelines, and visual marketing materials like ads showing product use. (Company names are pseudonyms for privacy.) Participants were instructed to create culturally resonant mottos tailored to U.S. consumer preferences. Students worked independently in a monitored computer lab, using personal resources like online dictionaries, translation memory databases, brand research, style guides, and glossaries. Professionals, geographically dispersed up to 1000 miles apart, completed the task remotely within the same timeframe. To ensure consistency, their activities were tracked via screen recording and timed submission protocols, despite differing settings. A panel of five university translation professors with doctorates and expertise in marketing translation evaluated the translations. With over 60 years of combined experience in cross-cultural psychology, international marketing, copywriting, and advertising, the raters used a standardized rubric assessing four metrics – conciseness, branding representativeness, comprehensibility, and memorability – each scored up to 5 points for a 20-point total, aligning with Iran’s tertiary education standards. A pre-scoring norming session ensured consistent application of criteria through exemplar discussions. To prevent bias, raters were blinded to translator group and GPT-3 use, with mottos presented in random order and identified by number. Inter-rater agreement, measured by Fleiss’ kappa, was 0.79, indicating strong consensus per Landis and Koch (1977). Some disagreement arose over conciseness, reflecting the subjective nature of translation quality assessment, even with structured rubrics (Bayer-Hohenwarter 2011; Hassani 2011; Doherty 2017). Raters also provided qualitative feedback on motto strengths and weaknesses, complementing numeric scores and enriching insights. The evaluation’s rigor – standardized rubrics, expert raters, multi-method assessment, and inter- rater reliability – bolstered the validity and reliability of the translation quality assessment. (The last tools you will buy). Briefs included textual brand histories, style guidelines, and visual marketing materials like ads showing product use. (Company na es are pseudonyms for privacy.) Participants were instructed to create culturally resonant ottos tailored to U.S. consumer preferences. Students worked independently in a monitored computer lab, using personal resources like online dictio aries, translation memory databases, bra d research, style guides, and glossaries. Professionals, geographically dispersed up to 1000 miles apart, completed the task remotely within the same timeframe. To ensure consistency, their activities were tracked via screen recording and timed submission protocols, despite differing settings. A panel of five university translation professors with doctorates and expertise in marketing translation evaluated the translations. With over 60 years of combined experience in cross- cultural psychology, international marketing, copywriting, and advertising, the raters used a standardized rubric assessing four metrics – conciseness, branding representativeness, comprehensibility, and memorability – each scored up to 5 points for a 20-point total, aligning with Iran’s tertiary educatio standards. A pre-scoring norming session ensured consistent application of criteria through exe plar discussions. To prevent bias, raters were blinded to translator group and GPT-3 use, with mottos presented in random order and identified by number. Inter-rater agreeme t, measured by Fleiss’ kappa, was 0.79, indicating strong consensus per Landis and Koch (1977). Some disagreement arose over conciseness, reflecting the subjective nature of translation quality assessment, even with structured rubrics (Bayer-Hohenwarter 2011; Hassani 2011; Doherty 2017). Raters also provided qualitative feedback on motto strengths and weaknesses, complementing numeric scores and enriching insights. The evaluation’s rigor – standardized rubrics, expert raters, multi-method assessment, and inter-rater reliability – bolstered the validity and reliability of the translation quality assessment. Additional details on softwa e tracking, remote m nitoring, experimental conditions, and GPT-3 tool selection, while implemented, are omitted as these are not essential to the core methodology. 211TRANSLATION STUDIES 6 Findings and Discussion While the pre-test results offer meaningful insights, our focus in presenting the findings will be centred primarily on the results of the post-test translation task. This allows us to concentrate on the main interest of this study – assessing the impact of GPT-3 tools on marketing translation quality after specialized training. Additionally, in the interest of brevity, only salient details of the post-test most vital to conveying the key quantitative and qualitative findings will be highlighted. Further granular specifics must be omitted. Table 1. Total Scores for Student and Professional Groups on the Pre-test and Post-test (Total: 20). Student Pre-test Student Post-test Professional Pre-test Professional Post-test DottleBox 6.2 MatchLite 12.1 DottleBox 11.4 MatchLite 8.2 RareRead 8.5 VidEdu 14.2 RareRead 11.9 VidEdu 12.3 SharpPoint 7.4 GentleTools 13.9 SharpPoint 9.8 GentleTools 14.9 Mean 7.36 Mean 13.4 Mean Score 11.03 Mean 11.8 To situate the post-test results, the pre-test performances establish an informative baseline. As shown in Table 1, the professionals initially outperformed the students by nearly 4 points before GPT-3 training, with average scores of 11.03 and 7.36 respectively. For the DottleBox brand of ashtrays, professionals scored 11.4 points versus 6.2 for the students. Similarly, for RareRead, a bookstore specializing in rare books, professionals received 11.9 points, while students managed 8.5. This decisive professional advantage persisted across SharpPoint, a fishing hook manufacturer, as well, leading students by 2.4 points. Professionals seem to have leveraged extensive real-world experience to demonstrate superiority in all metrics during the unaided pre-test translation. During the evaluation process, all translations were randomized and identified only by number, with raters blinded to translator group membership and GPT- 3 usage to prevent potential rating bias based on expected group differences. However, the narrative shifted dramatically in the post-test after students received specialized training in GPT- 3-powered applications. They gained a striking 6.04-point boost over their pre-test performance, while professionals improved by a marginal 0.5 points. Ultimately, students secured a 1.6-point post-test advantage over the professionals, exhibiting marked gains across most metrics. Delving deeper into the post-test results yields additional insight. Students substantially outperformed professionals in all metrics when translating the MatchLite motto. However, for VidEdu, this margin narrowed, and for GentleTools, professionals exceeded students in certain metrics like conciseness. Conciseness in this context refers to the ability to convey the brand message using a minimum of words while maintaining impact: specifically, mottos using fewer than seven words scored higher. Still, when compiled in aggregate across all companies, students secured decisive leads across every metric, with the widest gap observed in conciseness. While variances emerged across brands, preventing simplistic interpretations, the aggregated data suggests that GPT-3 tools enabled students to produce higher-scoring motto translations overall. The abbreviations S/P, C, R, U, M, and OA represent the student translator or professional translator, the average scores for conciseness, representability, comprehensibility (U standing 212 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran for the synonymic expression “understandability”), memorability, and the overall average score, respectively. In the case of Match-Lite, as presented in Table 2, the top three mottos as rated by all five evaluators were: A match from hell itself! (overall score 20), Let there be light (18), and Alight from the WWI (15). A match from hell itself! scored highly for its cheekiness, wit, and modernity, which contributed to its high memorability. Let there be light received full points from four raters Table 2. Mottos & Scores: Match-Lite.1 No. Motto S/P C R U M OA 1 A match from hell itself! S 5 5 5 5 20 2 Let there be light. S 5 5 3 5 18 3 Alight from the WWI2 S 5 4 3.5 2.5 15 4 A match as cool as its name S 2 4 4 4 14 5 A lifetime at your fingertips S 5 2 3 4 14 6 A flame to light your life S 5 3 3 2 13 7 A company rich in history but young at heart P 1 2 4 4 11 8 Match-Lite, A Matchless Match P 4 1 2 3 10 9 Goodnight sweetheart P 4 0 2 4 10 10 Life is short, match a smile P 3 1 3 3 10 11 Match-Lite: Tradition & Innovation S 4 1 3 2 10 12 From the Ashes to the Top S 3 3 1 2 9 13 Match-Lite, your number one choice P 3 0 4 1 9 14 Where fire and flame are our business P 0 3 3 2 8 15 Match-Lite strikes anywhere you want P 2 2 2 1 7 16 Enjoy the relaxation of a striking flame P 0 3 2 1 6 17 It’s not just a light, it’s an experience. P 0 2 3 1 6 18 Match-Lite is where Safe Protection Meets Beauty P 1 1 2 1 5 19 Match-Lite, always for your biggest adventures S 0 0 2 2 4 20 The resource for professional fireworks displays S 0 1 2 1 4 Total Students N/A 34 28 29.5 29.5 121 Professionals N/A 19 15 27 21 82 2 The phrase “Alight from the WWI” contains grammatical errors (inappropriate preposition “from” and unnecessary definite article “the”). This AI-generated motto was included in our survey in its original form to maintain data integrity. The errors highlight the need for human oversight in AI responses. 213TRANSLATION STUDIES for conciseness, representativeness, and memorability, although one deducted points for limited comprehensibility among non-religious people. The biblical reference resonated as a familiar, relatable story. Alight from the WWI was praised for conciseness, representability, and comprehensibility, but scored lower in memorability because of its lack of rhythmic, catchy phrasing. The historical reference to Match-Lite’s 1917 founding was appreciated. Two other top mottos scored 14 points. A match as cool as its name lost points for conciseness, at seven words. The raters were struck by the contradiction of a “cool” yet burning match. The other 14-point motto, A lifetime at your fingertips, was praised for evoking Match-Lite’s heritage and implying availability, but some found the phrasing vague. However, we acknowledge that assessment of marketing text quality inherently involves subjective elements. Despite Table 3. Mottos & Scores: VidEdu. No. Motto S/P C R U M OA 1 Learn. Earn S 5 5 5 5 20 2 Learn$$$ S 5 5 5 4 19 3 Education Reimagined P 5 5 4 4 18 4 A “Guru” at your fingertips P 4 4 5 4 17 5 Ready for your next AHA moment? P 4 4 4 5 17 6 Your gateway to knowledge S 5 3 4 4 16 7 From zero to hero S 5 3 4 4 16 8 Imagine. Learn. Succeed. S 5 4 4 3 16 9 Learn anytime, anywhere. S 5 4 4 3 16 10 Get smarter faster S 5 3 4 3 15 11 Learn today. Lead tomorrow. P 4 4 3 3 14 12 Be pro. Be seen. Be noticed. P 3 2 4 4 13 13 VidEdu – Learning is easy. P 4 3 4 2 13 14 Education Powered by VidEdu P 4 3 4 2 13 15 Wanna learn? S 5 2 3 3 13 16 VidEdu, one-stop shop for online education P 1 3 3 3 10 17 Cutting through the clutter of the Internet S 0 1 4 3 8 18 You’ve come to the right place. P 0 0 3 1 4 19 Your source for quick, easy and AFFORDABLE video tutorials P 0 2 2 0 4 20 Enjoy the ride on your journey with VidEdu. S 0 2 1 0 3 Total Students N/A 40 32 38 32 142 Professionals N/A 29 30 36 28 123 214 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran rigorous rubrics and norming sessions, individual rater preferences, cultural backgrounds, and personal interpretations of effective marketing language inevitably influenced scoring decisions. Analysis revealed inconsistencies in conciseness ratings for motto translations. For instance, translation #7 (9 words) scored an average conciseness of 1, while translation #20 (6 words, meeting client specifications) scored 0. Despite high inter-rater agreement, raters’ divergent views on conciseness – shaped by individual backgrounds, cultural perspectives, or preferences – caused discrepancies. Some prioritized word count, while others valued impactful longer phrases. To improve consistency, future studies should define conciseness clearly, e.g., a seven- word limit or balancing brevity with emotional resonance. Rater training is crucial to align scoring criteria and ensure reliable evaluations (Han 2020, 267–68; Doherty 2017, 142). Examining the total points awarded for each parameter by translator group reveals two key takeaways. First, while comprehensibility and memorability scores differ by 6 points for professionals, students scored identically in both. Whether this results from using GPT-3 applications is debatable. Second, the most significant gap between groups is in conciseness (15 points), while comprehensibility differs little (2.5 points). So, while GPT-3 appears to substantially improve conciseness, comprehensibility gains seem marginal. When looking at the top VidEdu mottos, the student-generated Learn. Earn. scored the highest with 20 points. Close behind in second was Learn$$$$ with 19 points, docked 1 point for recognition value despite its aesthetic appeal. For the professional mottos, the top three contenders were Education Reimagined (18 points), A “Guru” at your fingertips (17 points), and Ready for your next AHA moment? (also 17 points). Students rounded out places 6–10 before the professional and student mottos began intermingling in the rankings below the top 10 (Table 3). Examining the total points given for each metric reveals two notable findings. First, both student and professional groups exhibited markedly higher average scores across all assessment criteria relative to the Match-Lite motto translations. Students demonstrated the most pronounced score increase in comprehensibility (up 8.5 points), while Professionals saw the greatest score growth in branding representativeness (up 15 points). Second, whereas students had outperformed professionals by a sizeable 39-point margin in Match-Lite, this score differential shrank dramatically to just 19 points for the VidEdu motto. The reasons behind the professionals’ stronger VidEdu performance are largely ambiguous but potentially attributable to their comparatively greater real-world experience with adapting messaging for the education sector. Meanwhile, students’ disproportionately elevated scores may partially stem from the disproportionately abundant textual data on education versus matches in GPT-3’s training corpus. The vastly larger volume of education-related material likely enabled GPT-3 to generate more context-appropriate suggestions tailored to an education-focused brand like VidEdu. For GentleTools, two mottos received full marks: Blessed are the Gentle and Boring Done Fun! Evaluators praised the former for alluding to the biblical verse on the meek inheriting 215TRANSLATION STUDIES the earth. While the bible quotation Let there be light ranked second for Match-Lite, having lost 2 comprehensibility points, the raters felt the GentleTools verse would resonate more universally. Interestingly, the student who proposed these winning mottos ranked only 14th in the pre-test phase. However, after gaining access to GPT-3 tools, she seems to have experienced a boost in creativity. She cleverly utilized CopyAI’s “more like this” feature to generate the biblical motto for GentleTools, modelling it after her own Match-Lite entry that had ranked second. Evaluators ultimately rated her GentleTools motto as the top suggestion, while her Match-Lite entry took second place. Once again, this example highlights the fact that interpreting qualitative branding metrics inevitably allows some rater discretion. As Bayer-Hohenwarter (2011, 97) explains, “the subjective has to be acknowledged as an inevitable ingredient in any TQA recipe.” Raters also appreciated the play on words in Boring Done Fun! around tools for boring and boring as the opposite of fun. The next mottos, You break it. We fix it (student) and When the Table 4. Mottos & Scores: GentleTools. No. Motto S/P C R U M OA 1 Blessed are the Gentle S 5 5 5 5 20 2 Boring Done Fun! S 5 5 5 5 20 3 You break it. We fix it. S 4 5 5 5 19 4 When the tough get going! P 4 4 5 5 18 5 Riding the wave of industrialization P 4 5 4 4 17 6 The dream of a craftsman P 4 4 4 5 17 7 GentleTools: A Solid Choice P 5 3 4 5 17 8 WORK SMARTER NOT HARDER P 5 4 4 4 17 9 Gentle as a butterfly, stinging as a bee. P 1 5 5 5 16 10 Quality in, Quality out. S 5 2 5 3 15 11 The right tool for the right job S 2 5 5 3 15 12 Tough tool for a tough job S 3 4 4 4 15 13 Exceptional design. Exceptional durability. P 4 2 4 3 13 14 Inspiration for innovation S 4 2 4 3 13 15 GentleTools: A name you can trust. S 2 1 5 4 12 16 Built to last a lifetime P 4 2 3 3 12 17 GentleTools: where power meets integrity P 3 3 3 2 11 18 Unrivalled Function. Unbeatable Value P 5 1 3 2 11 19 Don’t let their small size fool you. S 0 2 2 4 8 20 Gentle Like a Woman. Tough Like a Man. S 0 1 1 0 2 Total Students N/A 30 32 41 36 139 Professionals N/A 39 33 39 38 149 216 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran tough get going! (professional), scored closely behind. Professionals collected the next 5 spots, #5–9. Table 4 breaks down the GentleTools motto ratings. Two evaluators made an astute observation regarding motto quality: the various motto parameters must work cohesively rather than be assessed fully in isolation for maximal impact. In other words, a motto may score highly in the individual metrics of conciseness, representativeness, comprehensibility, and memorability, yet still fail to be a compelling motto when those components do not synthesize into a cohesive and impactful whole. Simply excelling at each separate criterion does not guarantee that the final assembled motto will resonate powerfully with audiences. Evaluators emphasized that a transcreated motto’s success hinges on achieving a harmonious gestalt where the distinct elements coalesce into a seamless brand statement that lands persuasively. For example, Gentle Like a Woman, Tough Like a Man may be memorable, and comprehensible, and may represent the brand reasonably well, but it perpetuates harmful gender stereotypes that could alienate customers, warranting a low overall score. In contrast, the similarly structured motto Gentle as a butterfly, stinging as a bee resonated more fittingly with GentleTools’ desired branding of admirable yet non-toxic masculinity. It directly references the iconic motto Float like a butterfly, sting like a bee coined by legendary boxer Muhammad Ali to describe his graceful yet hard-hitting fighting style. Evaluators felt this motto expertly conveyed GentleTools’ branding aim to celebrate Ali’s principled model of conviction and resilience in line with themes of durability. By promoting Ali’s strength of character and principles, the adapted motto aligned admirably with the company’s desired values. It received high ratings of 5 in three metrics and lost points only in conciseness. This culminated in an apt overall score of 16 points. This example illustrates how a motto’s overall impression can exceed the sum of its individual trait ratings, a phenomenon noted in complexity theory (Blumczynski and Hassani 2019; Marais 2021; Marais and Meylaerts 2022). However, this observation arose unexpectedly since the raters had been expressly instructed to appraise isolated qualities, not holistic value. Including both detailed dimensional ratings and consolidated overall scores could have allowed us to explore this effect. Additionally, as powerful as LLMs like GPT-3 may be, they are not immune to perpetuating the genuine societal biases that are embedded in their training data. This raises critical ethical considerations regarding the responsible deployment of such AI systems. The biases exhibited by language models like GPT-3 reflect a broader issue afflicting AI systems across domains. For instance, AI-powered image generation tools like Midjourney or OpenAI’s DALL-E also show such biases; when asked to generate an image of a CEO, they usually depict a white male, likely mirroring stereotypical patterns in their training data. Conversely, in what seems like an over-correction driven by “wokism” interests to counteract this failure, Google’s Gemini model exhibits the opposite tendency. According to The Economist (2024), “[t]he tech giant’s new artificial-intelligence model invents black Vikings and Asian popes” in an apparent attempt to diversify representations. 217TRANSLATION STUDIES Similarly, the problematic gender stereotype suggestion Gentle Like a Woman, Tough Like a Man from GPT-3 likely stems from ingrained biases present in its textual training corpora. The model essentially amplifies societal patterns it discovers in the data. This concern is not limited to language models alone. Studies have shown that facial recognition tools exhibit poorer performance on minority groups when trained on datasets overrepresenting majority demographics (Howard et al. 2022). Just as racial biases emerge in some computer vision systems, language models that internalize imbalanced representations can propagate gender and other biases. Given the expanding global use of AI translation technologies, like e-commerce companies localizing for international audiences, the implications of circulating biased outputs could prove reputationally and financially detrimental (Zhang et al. 2021) to the companies deploying these systems and potentially harmful to the diverse consumer groups they aim to serve. More broadly, scholars like Bostrom (2014) and Harari (2018) have flagged threats of AI dominance across society. 7 Findings Based on Student Feedback Student participants offered crucial insights into using GPT-3 tools. Their feedback came through comments, tracking data, and audio messages, allowing triangulation across multiple qualitative sources. Surprisingly, despite both groups having equal opportunity to comment, most responses came from the students rather than the professionals. Students’ perspectives, derived from diverse feedback channels, inform some of this study’s key findings, including the following: With proper training in leveraging these AI tools, student translators managed to surpass professionals in some key translation quality metrics. However, fully harnessing the potential of LLMs like GPT-3 relies wholly on human discernment and skill. Students shared illuminating examples from working with the applications. One student was tasked with translating the VidEdu motto Easy and friendly learning from Persian to English. His initial plain prompt yielded uninspired suggestions like Simple and amicable education. However, after researching VidEdu’s fun, animated video-based courses, the student enhanced the prompt with vivid contextual details. This sparked creative alternatives like Engaging courses for the YouTube generation! and Making learning as fun as YouTube. Without added context, these suggestions remained lacklustre. But small tweaks adding colour significantly mproved results, though conciseness specifications still prevented selecting the most aesthetic options. Additionally, students emphasized thoroughly researching branding tone and context when formulating prompts. For example, when translating the slogan for GentleTools, an outdoor toolbox company, students learned that energetic, rugged prompts yielded suggestions conveying durability like Built to endure the elements. However, more refined prompts generated incompatible suggestions alluding to luxury. Elaborating further, students observed differences even across AI tools. When inputting the same GentleTools prompt into CopyAI versus Texta.ai, noticeably distinct suggestions emerged. 218 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran CopyAI proposed rugged slogans fitting the durable brand, such as Equipped for adventure. In contrast, Texta.ai generated refined suggestions alluding to luxury, like The finest instruments for the discerning craftsman. The reason these applications produce varying suggestions likely stems in part from differences in their settings. As discussed previously, while these tools share the GPT-3 foundation, factors like the AI engine (Curie, DaVinci, etc.) and hyperparameters such as temperature still differ, impacting text generation qualities. (Temperature refers to how deterministic or variable the responses can be (Tingiris 2021, 54–57).) The different settings adopted by these applications, resulting in outputs with differing qualities, highlight the importance of translators or language service providers experimenting with different models and tools. Based on their unique settings and configurations, each application’s outputs can vary, potentially serving different purposes or dealing more effectively with some cases. This underscores the value of extensive experimentation to determine which tool aligns best with the intended messaging and branding goals for a particular project. The divergence between CopyAI and Texta.ai highlights why it is crucial for translators to explore various options, as each may offer distinct features, strengths, or specialized capabilities better suited to the task at hand. This principle extends beyond just specialized GPT-3 tools to encompass leveraging state-of-the-art LLM-powered chatbots like ChatGPT, Claude, or Google’s Gemini as well since each offers complementary functionalities that may prove advantageous in different scenarios. By using distinct engines or settings optimized for certain styles, the tools produce noticeably divergent outputs. This tonal divergence between tools highlights the importance of selecting the application best aligned with intended messaging. In this case, CopyAI’s bold outdoor slogans resonated strongly with GentleTools’ desired branding. This example underscores the vital human role in judiciously steering these technologies based on branding needs and audience preferences. 8 Navigating the AI Translation Frontier: Prospects and Considerations This study indicates that with proper training, student translators can effectively use GPT-3 to improve marketing translation quality, surpassing professionals in adapting brand mottos after focused instruction. However, the small, specialized sample limits broad conclusions, as outliers or sampling quirks may have skewed the results. Future research should examine effects across diverse languages, text types, and evolving tools, and include consumer feedback through market testing to evaluate real-world responses to AI-assisted versus professional transcreations. Unlike traditional machine translation, LLMs like GPT-3 are not solely designed for translation. Trained on vast, varied datasets, they support applications like question- answering, post-editing, terminology extraction, and transcreation (Kenny 2022). This versatility, however, introduces risks such as hallucinations, biases, or logical inconsistencies, which can complicate critical translation tasks (Nimdzi Insights 2023). The study’s competitive pre/post-test format, with timed lab sessions, remote proctoring, and peer competition, prioritized variable control but reduced ecological validity. Real- 219TRANSLATION STUDIES world translation typically occurs independently, without surveillance or rigid constraints, limiting the applicability of findings to professional settings. Future studies in natural work environments could enhance authenticity. Comparing students and professionals under different conditions risks conflating factors. Professionals had more experience but no GPT-3 training, while students faced academic pressure and peer competition. Despite this, the comparison offers value. It benchmarks student readiness against industry standards, highlighting gaps to bridge for career entry, especially under Iran’s revised curriculum emphasizing practical skills. Additionally, students’ post-test success underscores the need for professionals to pursue ongoing training to stay competitive amid advancing technologies. By comparing groups and assessing specialized training, this study highlights the need to equip students with modern skills and encourage lifelong learning among professionals. As language technologies evolve, both groups must adapt. The findings suggest GPT-3’s potential to enhance marketing translation when guided by skilled users through thoughtful prompting and experimentation. Prompt engineering and LLM literacy emerge as essential skills. While GPT-3 can generate locale-specific suggestions, its effectiveness depends on human direction, as seen in students’ tailored outputs. This reinforces the enduring need for human oversight, aligning with studies showing translators value tools that support their goals but struggle with inflexible technology (Ruokonen and Koskinen 2017). Echoing Douglas Adams’ The Hitchhiker’s Guide to the Galaxy, where the Babel fish’s literal translations caused cultural misunderstandings, unchecked LLMs risk similar errors by amplifying biases or generating implausible content. Yet, when translators refine outputs through iterative prompting, combining human expertise with AI’s pattern recognition, they create a powerful synergy. This “augmented translation” approach – where AI handles repetitive tasks and offers creative options, while humans provide cultural insight and judgment – enhances outcomes. As technology and human expertise continue to integrate, translation workflows will likely embrace this collaborative model, delivering superior results through thoughtful partnership. References Bayer-Hohenwarter, Gerrit. 2011. “‘Creative shifts’ as a means of measuring and promoting translational creativity.” Meta 56 (3): 663–92. https://doi.org/10.7202/1008339ar. Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell.32 2021. “On the dangers of stochastic parrots: Can language models be too big?” In FAccT ‘21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–23. Association for Computing Machinery. Blumczynski, Piotr, and Ghodrat Hassani. 2019. “Towards a meta-theoretical model for translation: A multidimensional approach.” Target: International Journal of Translation Studies 31 (3): 328–51. https://doi.org/10.1075/target.17031.blu. 3 The author Margaret Mitchell intentionally used the pseudonym “Shmargaret Shmitchell” in this publication. 220 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran Bostrom, Nick. 2014. Superintelligence: Paths, Dangers, Strategies. Oxford University Press. Bowker, Lynne. 2023. De-mystifying Translation: Introducing Translation to Non-translators. Routledge. Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language models are few-shot learners.” Advances in Neural Information Processing Systems 33: 1877–1901. https://doi.org/10.48550/arXiv.2005.14165. The Business Research Company. 2025. AI in Language Translation Global Market Report 2025. https:// www.thebusinessresearchcompany.com/report/ai-in-language-translation-global-market-report. Cronin, Michael. 2013. Translation in the Digital Age. Routledge. Daems, Joke, and Lieve Macken. 2019. “Interactive adaptive SMT versus interactive adaptive NMT: A user experience evaluation.” Machine Translation 33 (1): 117–34. https://doi.org/10.1007/s10590-019 -09230-z. Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-training of deep bidirectional transformers for language understanding.” In Proceedings of NAACL-HLT 2019, 4171– 86. Association for Computational Linguistics. Díaz-Millón, Mar. 2021. “The role of transcreation in corporate communication: A case study in the US healthcare sector.” In Innovative Perspectives on Corporate Communication in the Global World, edited by María Dolores Olvera-Lobo, Juncal Gutiérrez-Artacho, and Irene Rivera-Trigueros, 159–76. IGI Global. Díaz-Millón, Mar, and María Dolores Olvera-Lobo. 2023. “Towards a definition of transcreation: A systematic literature review.” Perspectives 31 (2): 347–64. https://doi.org/10.1080/0907676X.2021.20 04177. Doherty, Stephen. 2017. “Issues in human and automatic translation quality assessment.” In Human Issues in Translation Technology, edited by Dorothy Kenny, 131–48. Routledge. The Economist. 2024. “Is Google’s Gemini chatbot woke by accident, or by design?” The Economist, February 28. https://www.economist.com/united-states/2024/02/28/is-googles-gemini-chatbot-woke -by-accident-or-design. Han, Chao. 2020. “A critical methodological review of translation quality assessment.” The Translator 26 (3): 257–73. https://doi.org/10.1080/13556509.2020.1834751. Harari, Yuval N. 2018. 21 Lessons for the 21st Century. Jonathan Cape. Hassani, Ghodrat. 2011. “A corpus-based evaluation approach to translation improvement.” Meta 56 (2): 351–73. https://doi.org/10.7202/1006181ar. Ho, Nga-Ki Mavis. 2024. Appraisal and the Transcreation of Marketing Texts: Persuasion in Chinese and English. Routledge. Howard, John J., Eli J. Laird, Rebecca E. Rubin, Yevgeniy B. Sirotin, and Jerry L. Tipton. 2022. “Evaluating proposed fairness models for face recognition algorithms.” In International Conference on Pattern Recognition, 431–47. Springer Nature Switzerland. Katan, David. 2001. “When difference is not dangerous: Modelling intercultural competence for business.” Textus XIV (2): 287–306. —. 2013. “Intercultural mediation.” In Handbook of Translation Studies, edited by Yves Gambier and Luc Van Doorslaer, 84–91. John Benjamins. —. 2016. “Translation at the cross-roads: Time for the transcreational turn?” Perspectives 24 (3): 365–81. https://doi.org/10.1080/0907676X.2015.1016049. —. 2018. “‘Translatere’ or ‘transcreare’: In theory and in practice and by whom?” In Translating and Interpreting Specific Texts, Contexts and Translation, edited by Cinzia Spinzi, Alessandra Rizzo and Marianna Lya Zummo, 139–60. University of Salento. Katan, David, and Mustapha Taibi. 2021. Translating Cultures: An Introduction for Translators, Interpreters and Mediators. 3rd ed. Routledge. Kenny, Dorothy. 2022. Machine Translation for Everyone: Empowering Users in the Age of Artificial Intelligence. Language Science Press. Kenny, Dorothy, and Stephen Doherty. 2014. “Statistical machine translation in the translation curriculum: Overcoming obstacles and empowering translators.” The Interpreter and Translator Trainer 8 (2): 276–94. https://doi.org/10.1080/1750399X.2014.936112. 221TRANSLATION STUDIES Landis, J. Richard, and Gary G. Koch. 1977. “The measurement of observer agreement for categorical data.” Biometrics 33 (1): 159–74. https://doi.org/10.2307/2529310. Łukasik, Marek. 2024. “The future of the translation profession in the era of artificial intelligence: Survey results from Polish translators, translation trainers, and students of translation.” Lublin Studies in Modern Languages and Literature 48 (3): 25–39. https://doi.org/10.17951/lsmll.2024.48.3.25-39. Marais, Kobus. 2021. “Complexity in translation studies.” In Handbook of Translation Studies, edited by Yves Gambier and Luc Van Doorslaer, 23–29. John Benjamins. Marais, Kobus, and Rein Meylaerts. 2022. “Introduction.” In Exploring the Implications of Complexity Thinking for Translation Studies, edited by Kobus Marais and Rein Meylaerts, 1–6. Routledge. Martin, J.R., and P.R.R. White. 2005. The Language of Evaluation: Appraisal in English. Palgrave Macmillan. Mellinger, Christopher D. 2019. “Computer-assisted interpreting technologies and interpreter cognition: A product and process-oriented perspective.” Tradumàtica: Tecnologies de la Traducció 17: 33–44. https://doi.org/10.5565/rev/tradumatica.228. Mitchell-Schuitevoerder, Rosemary. 2020. A Project-Based Approach to Translation Technology. Routledge. Moorkens, Joss. 2020. “’A tiny cog in a large machine’: Digital Taylorism in the translation industry.” Translation Spaces 9 (1): 12–34. https://doi.org/10.1075/ts.00019.moo. Nimdzi Insights. 2022. The Nimdzi 2022 Language Technology Atlas. https://www.nimdzi.com/nimdzi-lan guage-technology-atlas-2022/. —. 2023. The Nimdzi 2023 Language Technology Atlas. https://www.nimdzi.com/language-technology-at las/. —. 2024. The 2024 Nimdzi 100. https://www.nimdzi.com/nimdzi-100-2024/. —. 2025. The 2025 Nimdzi 100. https://www.nimdzi.com/nimdzi-100-2025. Olohan, Maeve. 2017. “Technology, translation and society: A constructivist, critical theory approach.” Target 29 (2): 264–83. https://doi.org/10.1075/target.29.2.04olo. Pielmeier, Hélène, and Paul O’Mara. 2020. The State of the Linguist Supply Chain. CSA Research. Rodríguez-Castro, Mónica. 2018. “An integrated curricular design for computer-assisted translation tools: Developing technical expertise.” The Interpreter and Translator Trainer 12 (4): 355–72. https://doi.org /10.1080/1750399X.2018.1502007. Ruokonen, Minna, and Kaisa Koskinen. 2017. “Dancing with technology: Translators’ narratives on the dance of human and machinic agency in translation work.” The Translator 23 (3): 310–23. https://doi .org/10.1080/13556509.2017.1301846. Tamkin, Alex, Miles Brundage, Jack Clark, and Deep Ganguli. 2023. “Understanding the capabilities, limitations, and societal impact of large language models.” arXiv: 1–8. https://doi.org/10.48550/arXiv .2102.02503. Tingiris, Steve. 2021. Exploring GPT-3. Packt Publishing. Torresi, Ira. 2010. Translating Promotional and Advertising Texts. Routledge. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention is all you need.” Advances in Neural Information Processing Systems 30: 5998–6008. https://doi.org/10.48550/arXiv.1706.03762. Way, Andy. 2020. “Machine translation: Where are we at today?” In The Bloomsbury Companion to Language Industry Studies, edited by Erik Angelone, Maureen Ehrensberger-Dow, and Gary Massey, 311–32. Bloomsbury Academic. Zhang, Daniel, Saurabh Mishra, Erik Brynjolfsson, John Etchemendy, Deep Ganguli, Barbara Grosz, Terah Lyons, James Manyika, Juan Carlos Niebles, Michael Sellitto, Yoav Shoham, Jack Clark, and Raymond Perrault. 2021. The AI Index 2021 Annual Report. AI Index Steering Committee, Human- Centered AI Institute, Stanford University. List of Contributors 224 Aja Barbič University of Maribor, Slovenia aja.barbic@student.um.si Mladen Borovič University of Maribor, Slovenia mladen.borovic@um.si Marija Brala Vukanović University of Rijeka, Croatia marija.brala@ffri.uniri.hr Hossein Davari Damghan University, Iran h.davari@du.ac.ir Melanija Larisa Fabčič University of Maribor, Slovenia melanija.fabcic@um.si Andrej Flogie University of Maribor, Slovenia andrej.flogie@um.si Nataša Gajšt University of Maribor, Slovenia natasa.gajst@um.si Daniel Hari University of Maribor, Slovenia daniel.hari@um.si Ghodrat Hassani Damghan University, Iran q.hassani@du.ac.ir Tommy Hastomo Universitas Negeri Malang (State University of Malang), Indonesia tomhas182@gmail.com David Hazemali University of Maribor, Slovenia david.hazemali@um.si Francisca Maria Ivone Universitas Negeri Malang (State University of Malang), Indonesia fransicamaria@um.ac.id Eva Jakupčević University of Split, Croatia ejakupcevic@ffst.hr Saša Jazbec University of Maribor, Slovenia sasa.jazbec@um.si Muhammad Fikri Nugraha Kholid Universitas Islam Negeri Raden Intan Lampung (Raden Intan State Islamic University Lampung), Indonesia fikrikholid44@gmail.com Agata Križan University of Maribor, Slovenia agata.krizan@um.si Rashmika Lekamge Sabaragamuwa University of Sri Lanka, Sri Lanka rashmi@geo.sab.ac.lk Bernarda Leva University of Maribor, Slovenia bernarda.leva@um.si Marta Licardo University of Maribor, Slovenia marta.licardo@um.si Marziyeh Malekshahi Damghan University, Iran m.malekshahi@du.ac.ir Silvana Neshkovska University “St. Kliment Ohridski”, Bitola, North Macedonia silvana.neskovska@uklo.edu.mk LIST OF CONTRIBUTORS 225 Tomaž Onič University of Maribor, Slovenia tomaz.onic@um.si Zmago Pavličič University of Maribor, Slovenia zmago.pavlicic@gmail.com Bojan Prosenjak University of Zagreb, Croatia bprosenj@m.ffzg.hr Andini Septama Sari Universitas Negeri Malang (State University of Malang), Indonesia andinisari@gmail.com Clayton Smith University of Windsor, Canada clayton.smith@uwindsor.ca Tadej Todorović University of Maribor, Slovenia tadej.todorovic@um.si Utami Widiati Universitas Negeri Malang (State University of Malang), Indonesia utami.widiati@um.ac.id Evynurul Laily Zen Universitas Negeri Malang (State University of Malang), Indonesia evynurullailyzen@um.ac.id Simon Zupan University of Maribor, Slovenia simon.zupan@um.si GUIDELINES FOR CONTRIBUTORS ELOPE English Language Overseas Perspectives and Enquiries ELOPE publishes original research articles, studies and essays that address matters pertaining to the English language, literature, teaching and translation. Submission of Manuscripts Manuscripts should be submitted for blind review in electronic form using the Faculty of Arts (University of Ljubljana) OJS platform (https://journals.uni-lj.si/elope/about/submissions). Only one contribution by the same author per volume will be considered. Each paper should be accompanied by abstracts in English and Slovene and keywords. Abstracts by non-native speakers of Slovene will be translated into Slovene by ELOPE. Please be sure to have a qualified native speaker proofread your English-language article. Suggested length of manuscripts is between 5,000 and 8,000 words. Manuscript Style and Format The manuscript should be in the following format: • title in English (no longer than 100 characters including spaces), • abstracts in English and Slovene (with a maximum of 150 words for each language) and (up to five) keywords, • the text should be divided into introduction; body of the paper (possibly subdivided); and conclusion. The text should be preferably written in Word format (OpenOffice and RTF files are also acceptable). Please observe the following: • 12-point Times New Roman font size, • 2.5 cm page margins on all sides, • 1.5-point line spacing, • left text alignment, • footnotes should be brief (up to 300 words per page; 10-point Times New Roman font size). For resolving practical style and formatting queries, please see the articles in the latest on-line issue or contact the technical editor. References References should comply with The Chicago Manual of Style (18th edition, 2024) author-date system. A Quick Guide to CMS is available here: https://www.chicagomanualofstyle.org/tools_citationguide/ citation-guide-2.html. Final note Please note that only manuscripts fully adhering to the ELOPE Guidelines for Contributors will be considered for publication. ELOPE Vol. 22, No. 1 (2025) Guest Editors Tomaž Onič, University of Maribor, Slovenia David Hazemali, University of Maribor, Slovenia Mladen Borovič, University of Maribor, Slovenia Journal Editors Smiljana Komar, University of Ljubljana, Slovenia Mojca Krevel, University of Ljubljana, Slovenia Editorial Board Lisa Botshon, University of Maine at Augusta, United States of America; Biljana Čubrović, University of Belgrade, Serbia; Michael Devine, Acadia University, Canada; Dušan Gabrovšek, University of Ljubljana, Slovenia; Michelle Gadpaille, University of Maribor, Slovenia; Meta Grosman, University of Ljubljana, Slovenia; Allan James, University of Klagenfurt, Austria; Victor Kennedy, University of Maribor, Slovenia; Bernhard Kettemann, University of Graz, Austria; Alberto Lázaro, University of Alcalá de Henares, Spain; J. Lachlan Mackenzie, VU University Amsterdam, Netherlands; Tomaž Onič, University of Maribor, Slovenia; Roger D. Sell, Åbo Akademi University, Finland; Andrej Stopar, University of Ljubljana, Slovenia; Rick Van Noy, Radford University, United States of America; Terri-ann White, University of Western Australia, Australia Editorial Secretary Gašper Ilc, University of Ljubljana, Slovenia Technical Editor Andrej Stopar, University of Ljubljana, Slovenia Proofreading Michelle Gadpaille Editorial Policy ELOPE: English Language Overseas Perspectives and Enquiries is a double-blind, peer- reviewed academic journal that publishes original research articles, studies and essays that address matters pertaining to the English language, literature, teaching and translation. The journal promotes the discussion of linguistic and literary issues from theoretical and applied perspectives regardless of school of thought or methodology. Covering a wide range of issues and concerns, ELOPE aims to investigate and highlight the themes explored by contemporary scholars in the diverse fields of English studies. Published by University of Ljubljana Press Založba Univerze v Ljubljani For the Publisher: Gregor Majdič, Rector of the University of Ljubljana Issued by Slovene Association for the Study of English Slovensko društvo za angleške študije Department of English, Faculty of Arts, University of Ljubljana Oddelek za anglistiko in amerikanistiko, Filozofska fakulteta, Univerza v Ljubljani Ljubljana University Press, Faculty of Arts Znanstvena založba Filozofske fakultete Univerze v Ljubljani For the Issuer: Mojca Schlamberger Brezar, Dean of the Faculty of Arts, University of Ljubljana The journal is published with support from the Slovenian Research and Innovation Agency. The publication is free of charge. Universal Decimal Classification (UDC) Kristina Pegan Vičič Journal Design Gašper Mrak Cover Marjan Pogačnik: Zimsko cvetje, 1994 7.6 x 10.0 cm; colour etching, deep relief Owner: National Gallery, Ljubljana, Photo: Bojan Salaj, National Gallery, Ljubljana Printed by Birografika Bori Number of Copies 110 https://doi.org/10.4312/elope.22.1 Online ISSN: 2386-0316 Print ISSN: 1581-8918 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.