University of Ljubljana Press 
Založba Univerze v Ljubljani
Ljubljana, 2025
ISSN 1581-8918
nglish 
anguage 
verseas 
erspectives and 
nquiries
RETHINKING ENGLISH STUDIES THROUGH AI:  
CHALLENGES, ETHICS, AND INNOVATION
Editors of ELOPE Vol. 22, No. 1:  
Tomaž ONIČ, David HAZEMALI and Mladen BOROVIČ
Journal Editors: Smiljana KOMAR and Mojca KREVEL 
Vol. 22, No. 1 (2025)  

Editors of ELOPE Vol. 22, No. 1:  
Tomaž ONIČ, David HAZEMALI and Mladen BOROVIČ
Journal Editors: Smiljana KOMAR and Mojca KREVEL  
University of Ljubljana Press
Založba Univerze v Ljubljani
Ljubljana, 2025
Vol. 22, No. 1 (2025)
nglish 
anguage 
verseas 
erspectives and 
nquiries
RETHINKING ENGLISH STUDIES 
THROUGH AI: CHALLENGES,  
ETHICS, AND INNOVATION
CIP - Kataložni zapis o publikaciji
Narodna in univerzitetna knjižnica, Ljubljana
811.111(082)
    RETHINKING English studies through AI : challenges, ethics, 
and innovation / editors Tomaž Onič, David Hazemali and Mladen 
Borovič. - Ljubljana : University of Ljubljana Press = Založba 
Univerze, 2025. - (ELOPE : English language overseas perspectives 
and enquiries, ISSN 1581-8918 ; vol. 22, no. 1)
ISBN 978-961-297-608-8
COBISS.SI-ID 239835651
3
Contents
PART I: INTRODUCTION
Tomaž Onič, Mladen Borovič, David Hazemali 9
Rethinking English Studies Through AI: Challenges, Ethics, and Innovation
 
PART II: LANGUAGE
Tadej Todorović, Andrej Flogie, Daniel Hari 19
Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech 
Act Classification in Pinter’s The Birthday Party
Generativna umetna inteligenca v pragmatiki: analiza natančnosti samodejne 
klasifikacije govornih dejanj v Pinterjevi drami Zabava za rojstni dan
Agata Križan, Aja Barbič  35
Appraisal Analysis and AI Chatbots: Do We Even Need Humans?
Analiza jezika vrednotenja in pogovorni sistemi: ali ljudi sploh potrebujemo?
 
PART III: ACADEMIC WRITING
Silvana Neshkovska 55
The Benefits and Risks of AI-Assisted Academic Writing: Insights from 
Current Research
Prednosti in tveganja pri znanstvenem pisanju s pomočjo umetne inteligence: 
spoznanja iz aktualnih raziskav
Rashmika Lekamge, Clayton Smith  69
Impact of Auto-Correction Features in Text-Processing Software on the 
Academic Writing of ESL Learners
Vpliv funkcije samodejnega popravljanja v programih za urejanje besedil na 
akademsko pisanje učencev in učenk angleščine kot drugega tujega jezika
4
Tommy Hastomo, Andini Septama Sari, Utami Widiati,  
Francisca Maria Ivone, Evynurul Laily Zen,  
Muhammad Fikri Nugraha Kholid  93
Does Student Engagement with Chatbots Enhance English Proficiency?
Ali uporaba pogovornih sistemov prispeva k izboljšanju znanja angleščine pri 
študentih in študentkah? 
PART IV: ENGLISH LANGUAGE AND LITERATURE TEACHING
Saša Jazbec, Bernarda Leva, Marta Licardo 113
AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers 
of English and German
UI je prišla in bo ostala: empirična raziskava o stališčih učiteljev in učiteljic 
angleščine in nemščine
Bojan Prosenjak, Eva Jakupčević 133
Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the 
Use of AI in Classroom Settings
Pogled osnovno- in srednješolskih učiteljev in učiteljic angleščine kot tujega 
jezika na Hrvaškem na uporabo UI pri pouku
 
PART V: TRANSLATION STUDIES
Nataša Gajšt 153
Applications of AI-driven Tools in Translating and Drafting Commercial 
Correspondence – A Slovenian-English Perspective
Uporaba orodij umetne inteligence pri prevajanju in sestavljanju poslovnih 
dopisov – slovensko-angleški vidik
Simon Zupan, Zmago Pavličič, Melanija Larisa Fabčič 171
Machine Translation of Independent Nominal Phrases in Technical Texts
Strojno prevajanje samostojnih samostalniških besednih zvez v tehničnih besedilih
5
Marija Brala Vukanović 185
Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and 
Implications for the EFL Classroom
Prevajanje (metafor) v dobi umetne inteligence: priložnosti, izzivi in 
posledice za učilnico angleščine kot tujega jezika
Ghodrat Hassani, Marziyeh Malekshahi, Hossein Davari 203
AI-Powered Transcreation in Global Marketing: Insights from Iran
Transkreacija z umetno inteligenco v globalnem marketingu: spoznanja iz 
Irana
  
LIST OF CONTRIBUTORS

Part I
Introduction

9INTRODUCTION
Rethinking English Studies Through AI:  
Challenges, Ethics, and Innovation
1 Introduction
Over a relatively brief period, the rapid development of artificial intelligence (AI) has reshaped 
our perception of traditional concepts that have been with us for decades, some even for 
centuries. Communication, language, academic writing, translation, and education have not 
escaped that transformation. Generative AI tools, particularly chatbots that use large language 
models (LLMs) to generate tailor-made texts, presentations, images, and videos, have entered 
the classroom, academic research, and various professional settings – often faster than our 
pedagogical frameworks and ethical standards can adapt. As these technologies grow more 
sophisticated and accessible, it has become impossible for the various fields of English studies 
– encompassing linguistics, writing, teaching, and translation – to ignore their presence and 
avoid their impact on all aspects of the fields.
This thematic issue of ELOPE responds to this phenomenon. It brings together eleven original 
research articles that critically and creatively engage with the implications of AI for English 
language use, learning, and mediation. The volume addresses a range of issues and contexts: 
from pragmatic annotation in literary texts, to metaphor translation in the EFL classroom, 
and from ESL writing development, to teacher perceptions of AI tools. The contributions 
draw the reader’s attention to both the advantages and the pitfalls of integrating artificial 
intelligence into the various fields of English studies.
The influence of AI on education is undeniably evident on many levels of teaching and 
learning. One of the salient aspects is its ability to individualize instruction: AI-driven 
platforms analyse each learner’s performance and recommend tailor-made exercises, readings, 
or feedback, with which – according to Msambwa, Wen and Daniel (2025); and Massaty, 
Fahrurozi and Budiyanto (2024) – we can sustain motivation and enhance progression. 
Another function of AI systems involves fostering critical present-day skills like computational 
thinking, or complex problem-solving, by using modern teaching or learning approaches 
such as just-in-time guidance or scaffolded challenges, i.e., problems structured to gradually 
increase in difficulty or complexity, with support (or scaffolding) provided along the way 
(Massaty, Fahrurozi, and Budiyanto 2024). Moreover, AI has expanded access to automated 
analysis and language support (Krishnan and Zaini 2025), which is not limited to English 
studies but interconnects with other disciplines. AI can also create collaborative learning 
environments by moderating group discussion, supporting peer-to-peer interaction, and 
converting static materials into adaptive simulations (Msambwa, Wen, and Daniel 2025; 
Orlanda-Ventayen 2024). Kusmiadi and Wahyudin (2024) also report that behind the 
scenes, administrative activities like grading, attendance monitoring, and early-alert systems 
are increasingly automated, supposedly making more time for the teaching staff to focus on 
innovative curriculum design and individualized mentorship.
Tomaž Onič, Mladen Borovič, David Hazemali
University of Maribor, Slovenia
10
Yet all these improvements open new, relevant considerations. Apart from the privacy and 
security issues raised by Yu et al. (2024) or Asad et al. (2024), which accompany the collection 
and analysis of student data, the use of AI opens the door to algorithmic biases that can skew 
recommendations, potentially privileging certain learners while marginalizing others (Cui 
and Alias 2024). Researchers also suggest caution in overreliance on AI, which can cause an 
eventual decrease in deeper cognitive engagement, as students elect to leave critical thinking 
to the machines (Butson and Spronken-Smith 2024; Castillo-Martínez et al. 2024). Ethical 
questions regarding authorship and academic integrity further complicate AI’s role in writing 
and assessment (Floridi 2023; Butler and Jiang 2025). These challenges are especially acute 
where limited digital infrastructure and low digital literacy might increase existing inequalities 
(Asad et al. 2024; Nguyen and Hoang 2025).
In research contexts, AI accelerates the process by analysing large bodies of data, from historical 
archives to learner datasets, to identify patterns that are – owing to dataset size – potentially 
beyond human grasp (Cui and Alias 2024; Kusmiadi and Wahyudin 2024). In their study based 
on historical document analysis, Hazemali et al. (2024) demonstrated that AI excels at select 
surface-level processing and data extraction, but falters on tasks demanding interpretation, 
context sensitivity, or inference. Additionally, AI-assisted writing tools streamline drafting, 
editing, and literature synthesis, yet they require careful human oversight to maintain scholarly 
rigor and guard against “black-box” errors, suggested by Castillo-Martínez et al. (2024) and 
Ramirez and Esparrell (2024). These capabilities support new methodologies based on (big) 
data, such as adaptive experimental designs, large-scale sentiment analyses, and interdisciplinary 
collaborations (Jacques, Moss, and Garger 2024; Orlanda-Ventayen 2024). Yet they also open 
methodological and ethical questions: how can we assure replicability if algorithms continually 
develop and change? Who merits authorship credit for AI-(co-)authored output? To what 
extent must AI’s internal logic be disclosed, particularly when privacy or intellectual property 
are at stake (Butson and Spronken-Smith 2024; Yu et al. 2024)?
As we can see from this review of recent educational and research development, there exists 
an urgent need for comprehensive ethical and policy frameworks. Institutions must balance 
AI-mediated automation with rigorous human oversight to protect privacy and academic 
integrity (Floridi 2023; Ali et al. 2024; Yu et al. 2024), while at the same time, they must 
promote training in digital literacy and ensure that the benefits of AI are not limited to small 
groups of learners and researchers but are accessible to all (Kusmiadi and Wahyudin 2024; Yu 
et al. 2024), which is one of the crucial tasks of the humanities in the digital world. 
2 Overview of the Studies
The articles in this issue are grouped into four thematic clusters – Language, Academic 
Writing, English Language and Literature Teaching, and Translation Studies – each addressing 
a particular aspect of AI and its growing role in our work. The boundaries between disciplines 
are, of course, neither strict nor hermetically detached from other fields, since the issues 
often venture into interdisciplinary areas. The present volume offers an insight how scholars, 
educators, and practitioners can engage with AI not merely as a tool, but as a stimulus for 
rethinking core assumptions and professional practices in English studies.
T. Onič, M. Borovič, D. Hazemali Rethinking English Studies Through AI: Challenges, Ethics, and Innovation 
T. Onič, M. Borovič, D. 
Hazemali
11INTRODUCTION
2.1 Language: AI as a Tool for Language Analysis
The first two articles investigate the application of generative AI in linguistic analysis. The 
opening study by Tadej Todorović, Andrej Flogie and Daniel Hari tests ChatGPT, Gemini, 
and DeepSeek for speech act classification in Harold Pinter’s The Birthday Party. With an 
accuracy of 82% under optimized conditions, the results affirm AI’s potential for supporting 
discourse annotation – particularly when prompts are paired with theoretical grounds, a 
practice increasingly advocated in AI-assisted humanities research (Lozić and Štular 2023).
The second article by Agata Križan  and Aja Barbič applies Martin and White’s appraisal 
framework to AI-generated analysis of evaluative language. The coding results done by 
ChatGPT and Microsoft Copilot were compared and then supervised by human analysts, 
revealing an encouraging overlap in basic categorization but a lack of nuance in AI-generated 
responses. This reflects a recurring challenge in AI-driven textual analysis: the tendency to 
prioritize formal correctness over content accuracy or critical precision (Gonzalez Garcia and 
Weilbach 2023).
2.2 Academic Writing: Supporting Writing with AI
Three articles address AI’s impact on student writing and engagement. In the first one, the 
author Silvana Neshkovska reviews literature on ChatGPT’s role in academic writing. While 
highlighting benefits in autonomy and motivation, the study warns against the ethical pitfalls 
of AI overuse. The blurred lines between assistance and authorship remain a pressing concern, 
particularly in educational contexts where writing is also a process of knowledge construction 
(Altmäe, Sola-Leyva, and Salumets 2023; Abadie, Chowdhury, and Mangla 2024; Asad et 
al. 2024).
The second article in this section by Rashmika Lekamge and Clayton Smith explores how 
learners of English as a Second Language (ESL) interact with auto-correction tools like the one 
provided in Microsoft Word. While the tools reduced surface-level errors, extended reliance 
on the tool led to lower self-editing skills and writing confidence – a dynamic mirrored in 
recent AI-based writing support tools (Kasneci et al. 2023; Kohnke, Zou, and Su 2025).
In a study of Indonesian university students, the authors Tommy Hastomo, Andini Septama 
Sari, Utami Widiati, Francisca Maria Ivone, Evynurul Laily Zen, and Muhammad Fikri 
Nugraha Kholid show that chatbot engagement, particularly behavioural and cognitive, 
correlates with improved English proficiency. This confirms emerging research suggesting 
that AI tools can support language acquisition and enhance vocabulary, grammar, and writing 
fluency if engagement is active, reflective, and task-focused (Ali et al. 2024; Krishnan, and 
Zaini 2025).
2.3 English Language and Literature Teaching: Teacher Attitudes, 
Competence, and Professional Development
In this section, two studies explore how language educators respond to AI in the classroom. A 
survey conducted by Saša Jazbec, Bernarda Leva and Marta Licardo among Slovenian teachers 
12 T. Onič, M. Borovič, D. Hazemali Rethinking English Studies Through AI: Challenges, Ethics, and Innovation 
finds that while AI is mostly not viewed as a threat, it is seen as a disruptor – requiring shifts 
in instructional design and professional identity. This echoes recent concerns about the social 
and psychological effects of AI in education (Suchithra and Arya 2025; Kasneci et al. 2023) 
and is consistent with Krishnan and Zaini’s (2025) conclusion that AI’s potential can be 
realized only when educators are well-trained and supported in its use. 
Croatian EFL teachers Bojan Prosenjak  and Eva Jakupčević likewise reveal mixed levels 
of digital competence. Professional development is therefore essential – not only for skill-
building but for helping educators and pre-service teachers form balanced, critical views of 
AI. This same goal is reinforced by Butler and Jiang (2025), who found that less confident 
users of ChatGPT were more likely to accept its output uncritically.
2.4 Translation Studies: Exploring AI’s Role in Language Mediation
The final section, containing four articles, examines translation issues and practices in the 
new context of AI presence. In the first of the four articles, Nataša Gajšt examines business 
correspondence translated with the help of ChatGPT, Claude, and Gemini. The author 
concludes that while the output was mostly usable, inconsistencies in tone and register 
demonstrate the need for human editorial judgment – a finding echoed in other recent 
research not specifically in the area of translation (e.g., Hazemali et al. 2024).
Another study by Simon Zupan, Zmago Pavličič and Melanija Larisa Fabčič explores machine 
translation of nominal phrases in technical texts. With nearly half the phrases mistranslated, 
the study exposes the limits of current LLMs in high-density, context-dependent language – a 
familiar challenge in AI language models that, according to Boros et al. (2024), still struggle 
with specialized corpora.
Unsurprisingly, metaphor translation presents another difficulty that AI cannot yet successfully 
address or resolve. While students in the experiment appreciated using AI tools, errors in 
figurative language revealed their limitations. The author Marija Brala Vukanović, however, 
argues that these inaccuracies can be turned into didactic benefits under the guidance of a 
skilled teacher.
The section closes with a study on AI-powered transcreation in cross-cultural marketing. 
Surprisingly, the authors Ghodrat Hassani, Marziyeh Malekshahi and Hossein Davari 
find that trained students outperformed professionals after using ChatGPT tools, which 
underlines the importance of quality prompt engineering and guided learning for an optimal 
outcome. According to Gonzalez Garcia and Weilbach (2023), this is particularly relevant in 
domains where cultural resonance is as crucial as linguistic accuracy.
3 Conclusion
The contributions to this special issue collectively show that artificial intelligence is no longer 
a peripheral novelty but a pertinent phenomenon that has already won a visible position in 
English studies. We can expect its relevance and status to grow stronger and more central in 
the future, regardless of the discipline or subfield of English studies, which is reflected in these 
13INTRODUCTION
articles that offer both a critical and constructive account of AI’s growing influence. Apart 
from this general understanding, the studies reach another shared conclusion, which is that 
AI tools are only as effective and ethical as the human users who operate them, and as they 
do so they rely on their own expertise and ethics. It is therefore crucial to strive for thoughtful 
and responsible integration of AI in academic and professional work.
This issue of ELOPE does not seek to offer final answers but rather to open new questions 
and inquiries. Teachers, researchers, translators, and others who deal with English studies 
are uniquely positioned to shape the newly emerging relationship between language and 
technology. The questions raised here – about accuracy, agency, pedagogy, and professional 
roles – will continue to define our fields in the years ahead. It is our hope that this collection 
provides a valuable foundation for those navigating, critiquing, and contributing to the 
future of AI in English language studies.
References
Abadie, Amelie, Soumyadeb Chowdhury, and Sachin Kumar Mangla. 2024. “A shared journey: 
Experiential perspective and empirical evidence of virtual social robot ChatGPT’s priori acceptance.” 
Technological Forecasting and Social Change 201: 123202.  
https://doi.org/10.1016/j.techfore.2023.123202.
Ali, Omar, Peter A. Murray, Mujtaba Momin, Yogesh K. Dwivedi, and Tegwen Malik. 2024. “The effects 
of artificial intelligence applications in educational settings: Challenges and strategies.” Technological 
Forecasting and Social Change 199: 123076. https://doi.org/10.1016/j.techfore.2023.123076.
Altmäe, Signe, Antonio Sola-Leyva, and Andres Salumets. 2023. “Artificial intelligence in scientific writing: 
A friend or a foe?” Reproductive BioMedicine Online 47 (1): 3–9.  
https://doi.org/10.1016/j.rbmo.2023.04 .009.
Asad, Muhammad Mujtaba, Shafaque Shahzad, Syed Hassan Ali Shah, Fahad Sherwani, and Norah 
Mansour Almusharraf. 2024. “ChatGPT as artificial intelligence-based generative multimedia for 
English writing pedagogy: Challenges and opportunities from an educator’s perspective.” International 
Journal of Information and Learning Technology 41 (5): 490–506.  
https://doi.org/10.1108/ijilt-02-2024-0021.
Boros, Emanuela, Maud Ehrmann, Matteo Romanello, Sven Najem-Meyer, and Frédéric Kaplan. 
2024. “Post-correction of historical text transcripts with large language models: An exploratory study.” 
In Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, 
Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024), St. Julians, Malta, 133–59. 
Association for Computational Linguistics. https://aclanthology.org/2024.latechclfl-1.14.pdf.
Butler, Yuko G., and Shiyu Jiang. 2025. “How do pre-service language teachers perceive generative AIs’ 
affordance?: A case of ChatGPT.” System 129: 103606.  
https://doi.org/10.1016/j.system.2025.103606.
Butson, Russell, and Rachel Spronken-Smith. 2024. “AI and its implications for research in higher 
education: A critical dialogue.” Higher Education Research & Development 
43 (3): 563–77. https://doi.org/10.1080/0729 4360.2023.2280200.
Castillo-Martínez, Isolda Margarita, Daniel Flores-Bueno, Sonia M. Gómez-Puente, and Victor O. Vite-
León. 2024. “AI in higher education: A systematic literature review.” Frontiers in Education 9: 
1391485. https:// doi.org/10.3389/feduc.2024.1391485.
Cui, Pengfei, and Bity Salwana Alias. 2024. “Opportunities and challenges in higher education arising 
from AI: A systematic literature review (2020–2024).” Journal of Infrastructure, Policy and Development 
8 (11): 8390. https://doi.org/10.24294/jipd.v8i11.8390.
14 T. Onič, M. Borovič, D. Hazemali Rethinking English Studies Through AI: Challenges, Ethics, and Innovation 
Dwivedi, Yogesh K., Laurie Hughes, Elvira Ismagilova, Gert Aarts, Crispin Coombs, Tom Crick, et al. 2021. 
“Artificial Intelligence (AI): Multidisciplinary Perspectives on Emerging Challenges, Opportunities, and 
Agenda for Research, Practice and Policy.” International Journal of Information Management 57: 101994. 
https://doi.org/10.1016/j.ijinfomgt.2019.08.002.
Floridi, Luciano. 2023. The Ethics of Artificial Intelligence. Principles, Challenges, and Opportunities. Oxford 
University Press.
Gonzalez Garcia, Giselle, and Christian Weilbach. 2023. “If the sources could talk: Evaluating large 
language models for research assistance in history.” In CHR 2023: Computational Humanities Research 
Conference, December 6–8, 2023, Paris, France. https://doi.org/10.48550/arXiv.2310.10808. 
Hazemali, David, Janez Osojnik, Tomaž Onič, Tadej Todorović, and Mladen Borovič. 2024. “Evaluating 
chatbot assistance in historical document analysis.” Moderna arhivistika 7 (2): 53–83.  
https://doi.org/10.54 356/ma/2024/biub3010. 
Jacques, Paul H., Hollye K. Moss, and John Garger. 2024. “A synthesis of AI in higher education: Shaping 
the future.” Journal of Behavioral and Applied Management 24 (2): 103–11.  
https://doi.org/10.21818/001c.12 2146.
Kasneci, Enkelejda, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank 
Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, Stephan Krusche, Gitta 
Kutyniok, Tilman Michaeli, Claudia Nerdel, Jürgen Pfeffer, Oleksandra Poquet, Michael Sailer, 
Albrecht Schmidt, Tina Seidel, Matthias Stadler, Jochen Weller, Jochen Kuhn, and Gjergji Kasneci. 
2023. “ChatGPT for good? On opportunities and challenges of large language models for education.” 
Learning and Individual Differences 103: 102274. https://doi.org/10.1016/j.lindif.2023.102274.
Kohnke, Lucas, Di Zou, and Fan Su. 2025. “Exploring the potential of GenAI for personalised English 
teaching: Learners’ experiences and perceptions.” Computers and Education: Artificial Intelligence 8: 
100371. https://doi.org/10.1016/j.caeai.2025.100371.
Krishnan, Vekneswary, and Hafiz Zaini. 2025. “A systematic literature review on artificial intelligence in 
English language education.” International Journal of Research and Innovation in Social Science 9 (1): 
82–88. https:// doi.org/10.47772/IJRISS.2025.903SEDU0002.
Kusmiadi, Kusmiadi, and Didin Wahyudin. 2024. “The role of artificial intelligence (AI) software in 
education and research: A systematic literature review.” Journal of Vocational Education Studies 7 (2): 
191–208. https:// doi.org/10.12928/joves.v7i2.10387.
Lozić, Edisa, and Benjamin Štular. 2023. “Fluent but not factual: A comparative analysis of ChatGPT and 
other AI chatbots’ proficiency and originality in scientific writing for humanities.” Future Internet 15 
(10): 336. https://doi.org/10.3390/fi15100336. 
Massaty, Muhammad Hassan, Slamet Kurniawan Fahrurozi, and Cucuk Wawan Budiyanto. 2024. “The 
role of AI in fostering computational thinking and self-efficacy in educational settings: A systematic 
review.” Indonesian Journal of Informatics Education 8 (1): 52–64.  
https://doi.org/10.20961/ijie.v8i1.89596.
Msambwa, Msafiri Mgambi, Zhang Wen, and Daniel Kangwa. 2025, “The impact of AI on the personal 
and collaborative learning environments in higher education.” European Journal of Education 60: 
e12909. https://doi.org/10.1111/ejed.12909.
Nguyen, Thanh Huyen, and Thi Ngoc Hien Hoang. 2025. “Investigating the promises and perils of 
generative AI in EFL learning in higher education: A literature review.” AsiaCALL Online Journal 16 
(1): 1–25. https://doi.org/10.54855/acoj.251611.
Orlanda-Ventayen, Caren Casama. 2024. “Empowering education through transformative role of artificial 
intelligence (AI) in teaching and learning: Educators’ perspective and research trends.” In 9th 
International Conference on Information Technology and Digital Applications (ICITDA), Nilai, Negeri 
Sembilan, Malaysia, 1–5. IEEE. https://doi.org/10.1109/ICITDA64560.2024.10809596.
Ramirez, Elkin Arturo Betancourt, and Juan Antonio Fuentes Esparrell. 2024. “Artificial intelligence (AI) 
in education: Unlocking the perfect synergy for learning.” Educational Process International Journal 13 
(1): 35–51. https://doi.org/10.22521/edupij.2024.131.3.
15INTRODUCTION
Suchithra, V. G., and C. S. Arya. 2025. “The study on ethics and biases in AI-powered 
education.” European Journal of Contemporary Education and E-Learning 3 (2): 37–43.  
https://doi.org/10.59324/ejceel.2025.3( 2).04. 
Yu, Ji Hyun, Devraj Chauhan, Rubaiyat Asif Iqbal, and Eugene Yeoh. 2024. “Mapping academic 
perspectives on AI in education: Trends, challenges, and sentiments in educational research (2018–
2024).” Educational Technology Research and Development 73: 199–227.  
https://doi.org/10.1007/s11423-024-10425-2.

Part II
Language 

19LANGUAGE
2025, Vol. 22 (1), 19-34(228)
journals.uni-lj.si/elope
https://doi.org/10.4312/elope.22.1.19-34
UDC: [81’33:821.111.09-2]:004.89
Tadej Todorović, Andrej Flogie, Daniel Hari
University of Maribor, Slovenia
Generative AI in Pragmatics: Assessing the Accuracy 
of Automated Speech Act Classification in Pinter’s 
The Birthday Party
ABSTRACT
This study explores the feasibility of using generative AI (ChatGPT, Gemini, and DeepSeek) 
to automate speech act annotation in Harold Pinter’s play The Birthday Party. Three 
chatbots – ChatGPT, Gemini, and DeepSeek – were tested under three scenarios varying 
in the amount of theoretical material provided. Each chatbot’s output was compared to a 
manually annotated reference via a Python script measuring classification accuracy. Scenario 
2 produced the highest accuracy overall (75–82%), while Scenario 1 underperformed, owing 
to incorrect reliance on external typologies, and Scenario 3 showed signs of overfitting. 
ChatGPT o1 emerged as the most accurate model, achieving 82% accuracy in Scenario 2. 
The findings suggest that GenAI chatbots can serve as valuable preliminary annotators when 
good prompt-engineering and well-curated theoretical material are provided. Future research 
could extend this methodology to more context-dependent texts, further refining prompt-
engineering strategies and exploring larger linguistic corpora.
Keywords: pragmatics, speech act analysis, ChatGPT, DeepSeek, Gemini, Pinter
Generativna umetna inteligenca v pragmatiki: analiza 
natančnosti samodejne klasifikacije govornih dejanj v Pinterjevi 
drami Zabava za rojstni dan
IZVLEČEK
Študija raziskuje smiselnost rabe generativne umetne inteligence (ChatGPT, Gemini in 
DeepSeek) za avtomatizacijo anotacije govornih dejanj v Pinterjevi drami Zabava za rojstni 
dan. Trije klepetalni roboti – ChatGPT, Gemini in DeepSeek – so bili testirani v treh scenarijih, 
ki so se razlikovali glede na obseg predloženega teoretičnega gradiva. Rezultati vsakega 
klepetalnega robota so bili primerjani z ročno anotirano različico s pomočjo Python skripte, 
ki je izmerila natančnost klasifikacije. Scenarij 2 je na splošno dosegel najvišjo natančnost 
(75–82 %), medtem ko je bil scenarij 1 zaradi neustreznega zanašanja na tuje tipologije 
preslab, scenarij 3 pa je kazal znake preprileganja (angl. overfitting). ChatGPT o1 se je izkazal 
za najnatančnejši model, saj je v scenariju 2 dosegel 82-odstotno zanesljivost. Ugotovitve 
kažejo, da lahko klepetalni roboti GEN-UI služijo kot koristni predhodni anotatorji, če so na 
voljo dobro zasnovani pozivi in dobro pripravljeno teoretično gradivo. Prihodnje raziskave bi 
lahko to metodologijo razširile na besedila, ki so bolj odvisna od konteksta, nadalje izpopolnile 
strategije inženiringa pozivov in raziskale večje jezikovne korpuse.
Ključne besede: pragmatika, analiza govornih dejanj, ChatGPT, DeepSeek, Gemini, Pinter
20 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ...
1 Introduction
In this paper, we examine the potential of generative artificial intelligence (GenAI) for assisting 
and optimizing research in pragmatics. Specifically, we focus on research using speech act 
analysis, a powerful tool that enables quantitative and qualitative insight into various pragmatic 
topics of interest, mediation (Kádár et al. 2024; House et al. 2024), small talk (House and 
Kádár 2023), bargaining (Liu, House, and Kádár 2024), etc. Such research can yield unique 
and robust results in pragmatics; however, the initial data collection process is often time-
consuming as it requires the researchers to manually annotate the data using some form of 
speech act typology. A tool such as GenAI that could either perform or, at the very least, 
facilitate this initial process would thus be especially beneficial for researchers in this field. 
We thus performed a case study, testing the potential of selected GenAI tools (ChatGPT, 
Gemini, and DeepSeek) for annotating Harold Pinter’s early play The Birthday Party (Pinter 
1991) using a finite speech act typology developed by Edmondson, House, and Kádár 
(Edmondson and House 1981; Edmondson, House, and Kádár 2023). We decided to test 
the chatbots’ capabilities using a literary work because historical documents, interviews, 
or other recordings usually require additional context for successful annotation, whereas a 
literary work is as close to a self-contained whole as possible. Next to identifying the most 
appropriate chatbot for the task, we also undertook the task of determining the best prompt 
(prompt-engineering) that researchers could use for this work. In doing so, we developed 
three scenarios for testing each chatbot: (1) instructing the chatbot to research the speech 
act typology online, providing it with a short annotated excerpt from The Birthday Party, 
and then instructing it to annotate the remainder of Act one; (2) providing the chatbot 
with a short description of the speech act typology (20 pages) and with a short annotated 
excerpt from The Birthday Party, and then instructing it to annotate the remainder of Act 
one; and (3) providing the chatbot with an exhaustive description of the speech act typology 
(80 pages) and with a short annotated excerpt from The Birthday Party, and then instructing 
it to annotate the remainder of Act one. Finally, we analysed the results by comparing the 
automatic annotations to the version of The Birthday Party manually annotated by human 
experts. In doing so, we endeavoured to answer the following research questions:
1. How successful are chatbots in providing an automatically annotated text in line with 
the given speech act typology?
2. Which scenario yields the best outcome (the highest fidelity to manual annotation)?
3. Are chatbots useful for performing such preliminary annotations or, at the very least, 
facilitating this process? 
2 Related Work
GenAI chatbots have been recognized as useful in many domains, including time-consuming 
tasks such as literature reviews, citation management, proofreading, summarizing, 
paraphrasing, etc. (Stokel-Walker 2023; Else 2023; Altmäe, Sola-Leyva, and Salumets 2023). 
They have been tested in various academic fields, such as machine translation of literary works 
(Mohar, Orthaber, and Onič 2020) and assistance in analysing historical documents (Hazemali 
Generative AI 
in Pragmatics: 
Assessing the 
Accuracy of 
Automated Speech 
...
21LANGUAGE
et al. 2024) with various degrees of success. In the field of pragmatics, there are several studies 
that examine chatbots themselves and their utterances, such as exploring Gricean Maxims to 
help inform the basic design of effective conversational interaction (Setlur and Tory 2022), 
using AI-generated conversations as human-like data for pragmatic analysis (Chen, Li, and Ye 
2024), and investigating the politeness strategies of chatbots (Monteiro, Pereira, and Salgado 
2023). In the field of speech act analysis, a recent study has examined whether chatbots are 
capable of assertion (Williams and Bayne 2024). Most of these studies focus on studying 
GenAI and generating new knowledge by examining their behaviour, which we believe to 
be a worthy endeavour; however, chatbots can also be useful as a facilitator of research in 
sometimes painstakingly slow processes, such as annotating large volumes of text, utterance 
by utterance, using a specific speech act typology. To our knowledge, no one has yet tried to 
use the capabilities of chatbots to aid in the analysis of such data.
3 Data and Methodology
3.1 Data
Our data includes a manual annotation of Harold Pinter’s The Birthday Party (Pinter 1991), 
one of Pinter’s most frequently performed works, with some critics ranking it among the 
greatest dramatic achievements of British theatre (Hribar 2004; Gavez 2016; Onič 2016). 
The play follows the events unfolding in a boarding house in an English seaside town, run by 
Meg and Petey Boles. It begins with an ostensibly mundane breakfast conversation between 
Meg and Petey and eventually transforms into a psychological play where two strangers, 
Goldberg and McCann, arrive at the boarding house, searching for Stanley, one of the 
“permanent” guests at the boarding house. 
We chose this play for three reasons. First, we wanted to analyse chatbots’ capabilities in 
analysing a literary work, which, compared to historical documents or diplomatic transcripts, 
is as close to a self-contained whole as possible. A speech act annotation of historical documents 
requires additional outside context, i.e., knowledge of the complex political situation during 
which the analysed discourse took place, so the researchers can attribute the correct speech 
acts to participants based on both their statements and their motivations in the context of 
the political situation (insofar as this information is known). For example, in the case of a 
speech act annotation of a mediation event between the EEC and the Slovenian and Croatian 
states, the authors first had to be familiarized with the political context that surrounded 
that mediation attempt (Kádár et al. 2024). Second, among the various types of literary 
works, plays are most appropriate for speech act annotation and subsequent analysis, as they 
feature (almost exclusively) direct speech, whereas other forms of literature, like novels or 
short stories, also include the narrative voice, which cannot be analysed in this way. Moreover, 
the resemblance to an ordinary everyday conversation that contains elements of naturally 
occurring discourse, such as hesitations, repetitions, self-corrections, or non-sequiturs, is 
highest in contemporary drama (Podbevšek and Žavbi 2021; Onič and Prajnč Kacijan 2020), 
as opposed to, for example, the language of Elizabethan drama, which is highly poetic. Third, 
The Birthday Party is one of the most exemplary absurdist plays; additionally, Pinter’s use of 
dialogue is characterized as “standard English, but the conversation doesn’t get anywhere” 
22 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ...
(Schechner 1966, 176): i.e., Pinter’s dramatic dialogue adheres to certain established everyday 
paradigms, typically English small-talk clichés, in which certain regularities apply which, 
whether written or unwritten, are quite firmly rooted in the English tradition (Onič 2016). 
This makes Pinter’s plays and The Birthday Party, in particular, ideal material for a pragmatic 
analysis that utilizes this speech act typology.
3.2 Methodology
The purpose of the paper is to establish which AI tool is best suited for analysing and annotating 
a text, in our case a play, using a finite speech act typology developed by Edmondson, House, 
and Kádár (Edmondson and House 1981; Edmondson, House, and Kádár 2023). We chose 
this speech act typology because, compared to other speech act typologies, it is finite, which 
prevents the invention of new speech acts. This allows for comparison of different texts and 
ensures replicability of the research thus produced (Kádár et al. 2024). Furthermore, the 
typology has been widely and successfully used in pragmatics, as evidenced by the influential 
works of various authors in the field (House 1996; Edmondson, House, and Kádár 2023; 
Taguchi and Kádár 2025). 
To determine the best tool for the job, we compare the results (the annotated play) of three 
AI tools available on the market to a manual annotation of The Birthday Party. The AI tool 
producing an AI-generated version with the least discrepancies compared to the manual 
annotation will be considered the most appropriate; our methodology is thus fundamentally 
contrastive. For the annotation, we chose the following AI tools: ChatGPT (OpenAI), 
Gemini (Google), and DeepSeek. While both ChatGPT and Gemini use a transformer-
based architecture, they nevertheless utilize different training data: ChatGPT uses a massive 
dataset of both text and human-annotated examples, whereas Gemini uses a proprietary 
dataset, curated by Google. Additionally, the two have different strengths: advanced language 
understanding in Gemini’s case and exceptional conversational ability in the case of ChatGPT 
(Rane, Choudhary, and Rane 2024). Both language understanding and conversational ability 
are variables relevant to pragmatics and might explain the differences in the final output. The 
final tool, DeepSeek, was added because of its lower cost of development and usage compared 
to ChatGPT and Gemini (DeepSeek-AI et al. 2025), even though it retains their capabilities 
and uses a new architecture – a more collective approach that uses a mixture of specialized 
neural networks that work in conjunction and not a massive, unified AI system (Moors 
2025). Furthermore, since DeepSeek is open-source and relatively easier to run because of its 
lower resource usage, it would be much easier to run it locally and thus avoid various privacy 
and security concerns related to using generative AI tools. All the selected tools have different 
context windows: ChatGPT supports contextual lengths of up to 128000 tokens, Gemini 
up to one million tokens, and DeepSeek up to 163840 tokens. This allows for the processing 
of very long documents or complex conversations while preserving the full context, which is 
ideal for our study and, if our study is successful, for analysis of entire corpora of texts using 
this approach (e.g., hundreds of absurdist plays). 
To simultaneously determine the best procedure (prompt engineering) for producing 
optimal output (an annotation of The Birthday Party that deviates the least from the manual 
23LANGUAGE
annotation), we tested the three tools in three different scenarios. In the first scenario (1), we 
instructed the chatbot to research the specific speech act typology online, then we uploaded 
a sample of manual annotation (approximately 150 lines) for the chatbot to analyse, and we 
finally instructed the chatbot to produce its own annotation of the remainder of Act one – 
the play was provided in .docx format, with each utterance in the play numbered and on a 
new line; the manual annotation was annotated in the same manner. In the second scenario 
(2), we did not instruct the chatbot to research the typology but instead provided a short 
description of the typology (approximately 20 pages) and speech acts from a referential work 
(Edmondson, House, and Kádár 2023) alongside the same sample of manual annotation 
that we used in the first scenario. The remaining instructions for the chatbot were the same: 
to produce its own annotation of the remainder of Act one. The final, third scenario (3), 
differed from the second in that we uploaded a much more exhaustive and comprehensive 
description of the speech act typology (approximately 80 pages) from the same source 
(Edmondson, House, and Kádár 2023), while the other steps remained the same as in the 
second scenario.
We wanted to utilize the best available iterations of GenAI in use; however, different iterations 
have different capabilities: for instance, some allow uploading of texts and files, and some do 
not; some can research content online, while others cannot. To address such discrepancies, 
we had to slightly modify our prompts for each specific GenAI model used. Considering the 
rapid development of new iterations, we find it extremely relevant to mention the specific 
iterations used in our study and their capabilities at this time.
In testing ChatGPT, we intended to utilize the o1 iteration, which generates longer trains of 
thought before providing an answer (Wang et al. 2024). We decided against using ChatGPT 
o3, based on various technical and deployment-related parameters. Although ChatGPT 
o3 is a newer model, it is currently a smaller, more latency- and cost-optimized “mini” 
version, which makes it less appropriate for complex linguistic and pragmatic tasks. In recent 
comparative studies (Raffel et al. 2023), such smaller and/or more resource-efficient models 
have yielded lower performance in deeper discourse understanding and reduced capacity for 
long-term context retention in comparison to larger models like o1 or GPT 4o. Furthermore, 
ChatGPT o1 supports a larger number of parameters dedicated to more advanced forms of 
reasoning and linguistic understanding (Wang et al. 2024), which we believe is crucial for 
tasks such as speech act classification. However, ChatGPT o1 cannot currently research topics 
online, so we could not use it for Scenario 1. Instead, we used ChatGPT 4o for Scenario 1 
and used the more powerful ChatGPT o1 for Scenario 2 and Scenario 3 (this is mentioned 
in the results). Nor can ChatGPT o1 read PDFs or documents, so the materials (manual 
annotation and the long and short theory) were provided in the prompt itself. 
In testing Gemini, we intended to utilize the 2.0 PRO Experimental iteration, yet, similar 
to ChatGPT, different iterations offer different capabilities. We chose the Gemini 2.0 PRO 
Experimental because, according to Google’s own documentation and independent analyses 
(Gemini Team et al. 2024; Chowdhery et al. 2022), it produces more advanced results 
in tasks related to logical reasoning and extended context retention – both of which are 
crucial for speech act analysis. However, because Gemini 2.0 PRO Experimental does not 
24 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ...
currently support online research, we utilized Gemini 2.0 Flash for Scenario 1. In terms 
of technical specifications and reasoning capabilities, Gemini 2.0 PRO Experimental is 
also the closest model to ChatGPT o1, making a direct comparison of their outcomes the 
most methodologically sound approach. This enabled us to rule out effects stemming from 
significant differences in architecture or dataset size and focus instead on the models’ actual 
ability to classify speech acts. However, it has the same limitations as ChatGPT o1 – it 
cannot research topics online or read PDFs or documents. Similarly to prompt-engineering 
for ChatGPT, we adopted the prompts for Scenario 2 and Scenario 2 for Gemini 2.0 PRO 
Experimental by providing the materials in the prompt itself. For Scenario 1, we used Gemini 
2.0 Flash, which can research topics online and read documents, so the prompt was not 
further adapted.
In testing DeepSeek, we utilized the DeepSeek-R1 iteration, which is an open-source model 
that enables both researching online, and the upload of PDFs and DOCX files. or documents 
(DeepSeek-AI et al. 2025; Mercer, Spillard, and Martin 2025). Furthermore, DeepSeek-
R1is, compared to ChatGPT o1 and Gemini 2.0 PRO Experimental, free to use and requires 
fewer resources to operate. Like ChatGPT o1 and Gemini 2.0 PRO Experimental, it offers 
enhanced reasoning capabilities and can process longer texts, making a direct comparison 
among the three models (ChatGPT o1, Gemini 2.0 PRO Experimental, and DeepSeek-R1) 
fully justified. Another advantage of DeepSeek-R1 is its open-source nature and relatively low 
computational demands, allowing for simpler local deployment and thus direct protection 
of sensitive data (Mercer, Spillard, and Martin 2025). According to published benchmarks 
(DeepSeek-AI et al. 2025), DeepSeek-R1 achieves statistically similar results to closed-source 
solutions on comparable text-intensive tasks, i.e., it should deliver at least an equivalent level 
of accuracy for speech act classification, while being free to use and requiring fewer resources 
to operate compared to ChatGPT o1 and Gemini 2.0 PRO Experimental.
For the quantitative comparison between the manually annotated classification (reference 
text) and the AI-generated classifications, we developed a Python script that automatically 
performs the following:
•	 Reads .docx files containing both the reference classification and the AI-generated 
classifications. 
•	 Compares the corresponding speech act for each line/utterance in the dialogue.
•	 Measures the similarity between sentences (using SequenceMatcher and the Hungarian 
algorithm1 for optimal line alignment). 
•	 Aligns the reference and the AI-predicted speech act types and calculates the number 
of mismatches. 
•	 Calculates the classification accuracy, i.e., the percentage of lines that were annotated 
correctly compared to the reference.
1 Also known as the Kuhn-Munkres algorithm (Kuhn 1955) – this is a classic algorithm for solving the assignment 
problem in combinatorial optimization, where the best match between elements of two sets is sought to maximize the 
total similarity or minimize the total distance.
25LANGUAGE
4 Results and Discussion
We have decided to segment the results into two sections. In the first part, we will present 
statistical data for all three scenarios, analysing the overall success rate of all three GenAI in 
this task and the general trends. In the second part, we will focus on a more detailed analysis 
of the mistakes and the general trends of those mistakes in the most successful chatbot, with 
specific examples from each speech act category.
Unsurprisingly, the chatbots were least successful in Scenario 1, where they were instructed 
to conduct online research on the speech act typology and, with the help of the manually 
annotated example, annotate the text. Our main concern was that the chatbots would 
not be able to differentiate between the various speech act typologies online and choose 
the one we prescribed. We hoped that providing the manually annotated example would 
ground the chatbots and steer them to the correct speech act typology; unfortunately, this 
was not the case. Both ChatGPT 4o and Gemini 2.0 Flash utilized speech acts that were 
outside the prescribed typology, with ChatGPT classifying 537 utterances and Gemini 2.0 
Flash classifying fifty-six utterances with categories outside the classification. Surprisingly, 
DeepSeek-R1 managed to utilize the correct typology, yet its accuracy was still only 29%. 
We believe this result can be explained by the fact that, for Scenario 1, we were forced to use 
fewer capable iterations of ChatGPT and Gemini (ChatGPT 4o instead of ChatGPT o1 and 
Gemini 2.0 Flash instead of Gemini 2.0 PRO Experimental) because of prompt limitations: 
the more capable models do not yet have online research enabled. We conjecture that using 
the more capable models would improve usage of the correct typology, considering the 
similarities in the results in other scenarios. 
That being said, the results of Scenario 1 indicate a clear winner, which was Gemini 2.0 Flash. 
Despite using a less capable model and despite assigning fifty-six utterances to speech act 
categories outside the prescribed technology, Gemini 2.0 Flash achieved an accuracy of 63%, 
which is more than twice as good as the other two chatbots, as shown in the table below.
Table 1. Accuracy of Chatbots in Scenario 1 (autonomous online research + manually annotated 
example).
Total speech 
acts
Correct 
classifications
Mismatched 
classifications
Accuracy (%)
ChatGPT 4o 762 201 561 26%
Gemini 2.0 Flash 762 480 282 63%
DeepSeek-R1 762 221 541 29%
Furthermore, detailed examination of the results for Gemini 2.0 Flash in Scenario 1 indicates 
that it might have performed even better. It misclassified thirty-three instances of speech act 
Request as Command. Commands are not in our speech act typology, yet they do belong 
under Request, so it could be argued that Gemini 2.0 Flash still correctly recognized the 
pragmatic intent behind these misclassifications. On the other hand, it misclassified relatively 
“easy” categories: for example, it misclassified all four instances of Leave-Take, which is a 
ritualistic speech act that signifies the termination of an encounter between two speakers. 
26 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ...
This is usually performed via tokens such as “Good night,” “Bye,” “See you,” “Cheerio,” 
etc. Nevertheless, Gemini 2.0 Flash classified three Leave-takes, “Ta-ta, Mrs. Boles,” Ta-ta,” 
and “Ta-ta, Stan,” as a Greet, a speech act utilized for acknowledging the presence of the 
interlocutor (“Hello,” “Hi,” etc.), and one Leave-take, “See you later,” as a Resolve (illocution 
used to express the speaker’s actions). Furthermore, it classified an instance of How-are-you 
(“How are you keeping, Mrs Boles?”), another Ritualistic speech as, as a Request. 
When using chatbots to facilitate the analysis of researchers, we would require them to at 
least identify the “easy” ritualistic speech acts, such as Leave-take and How-are-you, which 
are codified only by a few almost universal utterances. The fact that Gemini 2.0 Flash, as the 
most successful GenAI in Scenario 1, failed to do that, and in conjunction with the fact that 
it only achieved 63% total accuracy (still impressive, especially considering the much lower 
accuracy of ChatGPT 4o and DeepSeek-R1), means that using this scenario would not aid 
the researcher in their work. 
Scenario 2 was much more successful, producing the most accurate classifications of all three 
scenarios in all chatbots. Chatbots were instructed to use only a short excerpt of the theory, 
which resulted in no “phantom” speech act classifications – all three chatbots used only the 
appropriate twenty-five speech acts from the finite speech act typology in their annotation. 
All three chatbots were comparable in their results, with ChatGPT o1 emerging on top, 
having achieved an impressive accuracy of 81%, followed closely by DeepSeek-R1 with 
79% accuracy, and Gemini 2.0 PRO Experimental with 75% accuracy. The table below 
summarizes the success rate of each chatbot.
Table 2. Accuracy of Chatbots in Scenario 2 (short theory + manually annotated example).
Total speech 
acts
Correct 
classifications
Mismatched 
classifications
Accuracy (%)
ChatGPT o1 762 623 139 82%
Gemini 2.0 PRO 
Experimental
762 575 187 75%
DeepSeek-R1 762 600 162 79%
ChatGPT o1, the most successful chatbot in this scenario and the most successful overall, 
misclassified only 139 speech acts. Of those, it struggled the most with Opines, which were 
classified as Tells (39), Remarks (4), Suggests (4), and Complains (2); Resolves, which it 
classified as Tells (12), Remarks (2), Requests (2), Willings (1), and Promises (1); Complains, 
misclassified as Requests (8), Remarks (4), Opines (3), and Tells (3); and Remarks, misclassified 
as Requests (7), Tells (2), Opines (2), and Complains (1). The entire table of misclassifications 
for ChatGPT o1 in Scenario 2 is presented below.
DeepSeek-R1 misclassified Tells as Discloses (15), Opines (7), Requests (2), Thanks (1), 
Remarks (1), and Complains; Opines as Tells (14), Complains (12), Requests (4), Remarks 
(3), Resolves (3), Suggests (1), Willings (1), and Discloses (1); Complains as Requests (12), 
Opines (10), Tells (4), and Discloses (1); and Remarks as Opines (11), Requests (8), Tells (5), 
and Resolves (2).
27LANGUAGE
Gemini 2.0 PRO Experimental struggled the most with Opines, which were often 
misclassified as Tells (47), Discloses (3), Resolves (2), and Requests (1); Complains, which 
were misclassified as Opines (25), Tells (13), Requests (11), Remarks (2), and Discloses (1); 
Remarks, which were misclassified as Tells (17), Requests (11), and Opines (4); and Resolves, 
which were misclassified as Tells (9), Requests (4), and Willings (1). 
We note that similar patterns emerge in all three scenarios, where the most misclassified 
speech acts were Opines, Tells, Complains, Resolves, and Remarks. This is unsurprising, 
considering the nature of such speech acts. The delineation between Opines and Tells, for 
example, is largely subjective  (Edmondson, House, and Kádár 2023, 169), and the deciding 
factor is usually the person annotating the text, who “decides” on some criteria (note that this 
is not fatal for the methodology, as long as the criteria are applied consistently). This means 
that if the reference text was annotated differently (yet still consistently), the chatbots’ success 
rate order might have been reversed. The accuracy would remain in the same range, as the 
differences between individual styles of annotation would average out. Overall, the results 
of Scenario 2 represent a (surprisingly) stellar result. At 75-82% accuracy, all three chatbots’ 
performances could be used for conducting a preliminary classification of speech acts, 
which would facilitate the workload of researchers conducting speech act analysis. Before we 
examine ChatGPT o1’s results in more detail, a presentation of Scenario 3 results is in order.
The results of Scenario 3 were, surprisingly, slightly worse than the results for Scenario 2. 
While the difference was marginal (a few percentage points, as indicated in the table below), 
it was detectable in all three instances. 
Table 3. Misclassifications of ChatGPT o1 in Scenario 2; the left column represents the 
misclassified reference category, the rows to the right show how ChatGPT o1 categorized them.
Te
ll
O
pi
ne
R
eq
ue
st
R
em
ar
k
Su
gg
es
t
C
om
pl
ai
n
W
ill
in
g
R
es
ol
ve
Pr
om
is
e
To
ta
l
Opine 39 5 4 2 50
Resolve 12 2 2 1 1 18
Complain 3 3 8 4 18
Remark 2 2 7 1 1 13
Request 4 2 2 1 2 11
Tell 4 3 1 8
Minimise 3 2 1 6
Suggest 5 5
Thanks 1 3 4
Disclose 2 2
Willing 1 1
Invite 1 1 2
Greet 1 1
Total 139
28 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ...
Table 4. Accuracy of Chatbots in Scenario 3 (long theory + manually annotated example).
Total speech 
acts
Correct 
classifica-
tions
Mismatched 
classifica-
tions
Accuracy 
(%)
ChatGPT o1 762 611 146 81%
Gemini 2.0 PRO Experimental 762 563 199 74%
DeepSeek-R1 762 579 183 76%
We conclude that Scenario 1 was not as effective because of the sheer amount of data the 
chatbots found online, which resulted in conflicting typologies being applied (as evidenced 
in the use of classifications that were not in the proposed finite typology), yet Scenario 3 
suffered from a similar shortcoming.  An explanation of this worse outcome might be readily 
available in studies examining the empirical relationship between dataset size, model size, 
and compute power in GenAI. These studies establish that while increasing dataset size 
improves model accuracy, the benefits taper off beyond a certain threshold (Kaplan et al. 
2020; Chowdhery et al. 2022). At that point, models may begin to overfit. Overfitting is a 
phenomenon in machine learning that occurs when a model fits the training data too well 
or even exactly, which results in worse performance on any new or unseen data. To further 
improve the results, techniques such as regularization, pruning, and dropout to mitigate 
performance degradation might be required (Kaplan et al. 2020; Chowdhery et al. 2022). 
In Scenario 3, where we provided an exhaustive 80-page theoretical background, the model 
might have exhibited a tendency towards overfitting, making unnecessary distinctions and 
misclassifying instances. This mirrors findings in large-scale model training, where excessive 
data can paradoxically lead to poorer performance because of increased memorization. This 
aligns with other studies on machine learning, for example, findings from Kaplan et al. 
(2020), which highlight diminishing returns when dataset size surpasses a certain threshold.
Still, both Scenario 3 and Scenario 2 produced results in the range of approximately 75-82%, 
which makes them appropriate for research facilitation and, at the very least, preliminary 
annotation of data. Indeed, we would argue that the results are even better than purely 
statistical data shows. A more qualitative approach to the results of the best performing chatbot, 
ChatGPT o1 in Scenario 2, confirms that assertion. We can demonstrate this by examining 
and contextualizing the kinds of mistakes the chatbot made in individual categories.
Opines
Overall, ChatGPT o1 misclassified 50 Opines; however, thirty-nine of those were 
misclassified as Tells. The delineation between Opines and Tells is subjective, so different 
manual annotations might yield even higher accuracy for ChatGPT o1. In fact, from a 
research perspective, it would be useful to instruct the chat to mark any instance of Opine or 
Tell as Opine/Tell, and the researcher could then produce a more fine-grained verdict based 
on the needs of the project. In the case of The Birthday Party, distinguishing between Opines 
and Tells proves to be especially difficult, as characters often formulate their opinions as facts, 
for example, in the exchanges between Stanley and Meg.
29LANGUAGE
Example 1.
STANLEY. The milk’s off.      Opine/Tell
MEG. It’s not.       Opine/Tell 
Example 2.
MEG. Perhaps they couldn’t find the place in the dark.  Opine/Opine
It’s not easy to find in the dark.    Opine/Tell
STANLEY. They won’t come.     Opine/Opine
Someone’s taking the Michael.     Opine /COmplain
Forget all about it.      Request/ResOlve
It’s a false alarm.      Opine/Tell
A fake alarm.      Opine/Tell 
In example 1, ChatGPT o1 marked both utterances as Tell; at least for Stanley’s utterance, 
this might be correct in certain cases. As annotators, we decided to mark this as Opine 
because it is a statement that Stanley and Meg dispute and because of the broader context – 
Stanley’s badgering of Meg. However, Stanley formulates the utterance as a fact (Tell), so one 
could also adopt a different criterion and classify it as a Tell. Similarly in example 2, the line 
“It’s not easy to find in the dark” could also be classified as a Tell if taken out of context, but 
we classified it as an Opine because Meg was continuing her speculation from the utterance 
before (“Perhaps they couldn’t find the place in the dark”). Similarly for Stanley’s “It’s a false 
alarm”: in most cases, this would be considered a Tell, but because we know from earlier 
that this is Stanley continuing his speculation, we label it as Opine. So, the misclassifications 
of chatbots are often related to the broader context and actual meaning of the text, which 
chatbots have not (yet) mastered.
Resolves
Most Resolves were also mislabelled as Tells, especially in instances when a Resolve followed a 
Request in Initiate-Satisfy pattern. Requests are often satisfied with either a Resolve or a Tell, 
and the chatbot had trouble differentiating between the “No” of Tell (Did you know? No.) 
and the “No” of Resolve (Come here. No.), as in Example 3 and 4.
Example 3.
GOLDBERG. Well, of course, you must have one.
(He stands.) We’ll have a party, eh?  
What do you say?      Request/Request
MEG. Oh yes!      ResOlve/Tell
Example 4.
MEG. What do you mean?/     Request/Request
STANLEY. Come over here./     Request/Request 
MEG. No.       ResOlve/Tell
30 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ...
Complains
Complains were most often misclassified as Requests. As can be seen in examples 5 and 
6 below, Complains in the text were often formulated as Requests, so we can see why the 
chatbots classified them as such. Only the broader context (mostly utterances before and after 
the Complain) determines that it is, in fact, a Complain. As in the case of Opines, chatbots 
were missing this additional context, a deficiency that might be rectified in further studies via 
smart prompt-engineering.
Example 5. (Lulu is scolding Stanley):
LULU: I mean, what do you do, just sit around the house like this all day long?   
     COmplain/RequesT 
Hasn’t Mrs Boles got enough to do without having you under her feet all day long?  
     COmplain/RequesT
Example 6.
MCCANN. Sure I trust you, Nat. 
GOLDBERG. But why is it that before you do a job you’re all over the place, and when 
you’re doing the job you’re as cool as a whistle?
     COmplain/RequesT
MCCANN. I don’t know, Nat.    
Remarks
Interestingly, Remarks were most often misclassified as Requests. Remarks are highly 
ritualistic speech acts, while Requests are substantive speech acts, so the discrepancy is worth 
addressing. In our manual annotation, we annotated utterances like Example 7 Remarks, as 
they were often followed by a more substantive Request and they function more as Remarks 
in the dialogue (Meg replies to the Requests, not Remarks); however, a different researcher 
might, like chatbots, interpret them as very mild Requests.
Example 7.
What’s his name?/      Request/Request
MEG. Stanley Webber./     tell/tell
GOLDBERG. Oh yes?/     RemaRk/RequesT 
Does he work here?/      Request/Request
MEG. He used to work./     tell/tell
He used to be a pianist./     tell/tell
In a concert party on the pier./     tell/tell
GOLDBERG. Oh yes?/     RemaRk/RequesT
On the pier, eh?/      RemaRk/RequesT
Does he play a nice piano?/     Request/Request
MEG. Oh, lovely./      Opine/Opine
31LANGUAGE
Requests
Chatbots had the most difficulties recognizing requests that were not in a question form, 
which they classified as Tells, as we can see in Examples 8 and 9. However, all chatbots had 
remarkable overall results in terms of Requests. ChatGPT o1 correctly identified 242 out of 
253 Requests, with an accuracy of 96%. This might also be because Requests are often in 
question forms, so they are relatively easy to recognize.
Example 8.
GOLDBERG. You know what I said when this job came up. RequesT/Tell 
I mean naturally they approached me to take care of it.  tell/tell 
And you know who I asked for?    Request/Request
MCCANN. Who?      Request/Request
Example 9.
MEG. He hasn’t mentioned it.     tell/tell
GOLDBERG (thoughtfully). Ah!    RemaRk/RequesT 
Tell me.       RequesT/Tell 
Are you going to have a party?     Request/Request
MEG. A party?      Request/Request
Other Speech Acts
Other speech act categories yielded less than 10 misclassifications across the entire text. 
Furthermore, the reasoning behind the mistakes is often like the above: the chatbots were 
unable to recognize the pertinent context. Some Tells, for example, were misclassified as 
Opines, usually because of emotive language in the utterances. Interestingly, none of the 
chatbots (excluding one count in case of Gemini 2.0 PRO experimental in Scenario 2), 
managed to recognize any of the Minimizes in the play, which were usually misclassified as 
Opines or Tells. Suggests were misclassified as Requests, which is not surprising, considering 
it is sometimes difficult to articulate the difference between the two. We use the criterion of 
benefit for the speaker for Requests and benefit for the hearer for Suggest, but more complex 
cases, which might benefit both the speaker and the hearer, complicate things and require 
additional ad hoc criteria. Thanks was misclassified as Opine or Tell, which is also due to a 
failure to grasp the necessary context of the text. On the bright side, Leave-Takes, Welcomes, 
and How-are-yous were classified with 100% accuracy by all chatbots in Scenarios 2 and 
3. All chatbots also had a high accuracy in recognizing other ritualistic speech acts, such as 
Greets, yet only ChatGPT o1 (in Scenario 2 and 3) managed to recognize one instance of 
another ritualistic speech act, Extractor. 
5 Conclusion
The purpose of this article was to determine the viability of using different GenAI chatbots 
for automated speech act annotation of texts for pragmatic purposes. We sought to answer 
the following questions: 
32 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ...
1. How successful are chatbots in providing an automatically annotated text in line with 
the instructed speech act typology?
2. Which scenario yields the best outcome (the highest fidelity to manual annotation)?
3. Are chatbots useful for performing such preliminary annotations or, at the very least, 
facilitating this process? 
In line with this, we can draw the following conclusions. Chatbots were (1) highly successful 
at annotating the text with the prescribed typology, yielding 75-82% accuracy; however, 
(2) the prompt-engineering that instructs the chatbots does matter: Scenario 1 offered 
only 26% (ChatGPT 4o), 29% (DeepSeek-R1), and 63% (Gemini 2.0 Flash) accuracy. 
Chatbots should therefore be provided with a much smaller reference frame (data provided) 
within which to operate. Furthermore, more is not always better: the more detailed theory 
in Scenario 3 yielded slightly worse results, though still useful. We believe this to be the 
result of overfitting, which is consistent with results from other studies on machine learning, 
which indicate diminishing returns when a dataset surpasses a certain threshold (Kaplan 
et al. 2020). Whether the accuracy could be further improved is subject to further studies, 
which should experiment with different prompts, as well as with the quantity and perhaps 
quality of the theory provided to the chatbots. Finally, we believe that the results warrant a 
tentative conclusion that (3) chatbots, using prompts such as Scenario 2, can be useful and 
can facilitate research in pragmatics by providing an automated preliminary annotation of the 
text. That being said, further research is needed, especially in terms of how the chatbots would 
perform in annotating texts that require further context, such as historical documents. One 
limitation of this study was that we tested the chatbots on a play, which typically includes the 
relevant context for the viewer/reader, whereas for annotation of a historical text, we require 
further historical context to properly classify utterances. Whether chatbots are capable of 
that is subject to further research. Furthermore, it would be useful to test the capabilities of 
chatbots in annotating and classifying texts in other domains of pragmatics and linguistics 
in general, such as gambits or ritual frame indicating expressions. Considering the relative 
similarity between the tasks, our approach could be beneficial in these areas as well.
References
Altmäe, Signe, Alberto Sola-Leyva, and Andres Salumets. 2023. “Artificial intelligence in scientific writing: 
A friend or a foe?” Reproductive BioMedicine Online 47 (1): 3–9.  
https://doi.org/10 .1016/j.rbmo.2023.04.009.
Chen, Xi, Jun Li, and Yuting Ye. 2024. “A feasibility study for the application of AI-generated 
conversations in pragmatic analysis.” Journal of Pragmatics 223:14–30.  
https://doi.org/10.10 16/j.pragma.2024.01.003.
Chowdhery, Aakanksha, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, 
Paul Barham, et al. 2022. “PaLM: Scaling language modeling with pathways.” arXiv.  
https:// doi.org/10.48550/arXiv.2204.02311.
DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, et al. 
2025. “DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning.” arXiv. 
https://doi.org/10.48550/ARXIV.2501.12948.
Edmondson, Willis, and Juliane House. 1981. Let’s Talk, and Talk about It: A Pedagogic Interactional 
Grammar of English. Urban & Schwarzenberg.
Edmondson, Willis J., Juliane House, and Daniel Z. Kádár. 2023. Expressions, Speech Acts and Discourse: A 
Pedagogic Interactional Grammar of English. Cambridge University Press.
33LANGUAGE
Else, Holly. 2023. “Abstracts written by ChatGPT fool scientists.” Nature 613 (7944): 423.  
https://doi .org/10.1038/d41586-023-00056-7.
Gavez, Urša. 2016. “The reception of Harold Pinter’s plays in Slovenia between 1999 and 2014.” ELOPE: 
English Language Overseas Perspectives and Enquiries 13 (2): 51–61.  
https://doi.org/10 .4312/elope.13.2.51-61.
Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, et al. 
2024. “Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.” 
arXiv. https://doi.org/10.48550/ARXIV.2403.05530.
Hazemali, David, Janez Osojnik, Tomaž Onič, Tadej Todorović, and Mladen Borovič. 2024. “Evaluating 
chatbot assistance in historical document analysis.” Moderna arhivistika 7 (2): 53–83.  
https://doi .org/10.54356/ma/2024/biub3010. 
House, Juliane. 1996. “Developing pragmatic fluency in English as a foreign language: Routines and 
metapragmatic awareness.” Studies in Second Language Acquisition 18 (2): 225–52.  
https://doi .org/10.1017/S0272263100014893.
House, Juliane, and Dániel Z. Kádár. 2023. “Studying small talk from a pragmatic angle: An 
introduction.” Acta Linguistica Academica 70 (4): 411–18. https://doi.org/10.1556/2062.20 23.00704.
House, Juliane, Dániel Z. Kádár, Tadej Todorović, Matjaž Klemenčič, David Hazemali, Tomaž Onič, and 
Katja Plemenitaš. 2024. “Capturing power in diplomatic language use: The case of a closed-door 
mediatory negotiation and its aftermath during the breakup of the former Yugoslavia.” Journal of 
Language and Politics. https://doi.org/10.1075/jlp.24036.hou.
Hribar, Darja. 2004. “Harold Pinter in Slovene translation.” ELOPE: English Language Overseas Perspectives 
and Enquiries 1 (1–2): 195–208. https://doi.org/10.4312/elope.1.1-2.195-208.
Kádár, Dániel Z., Juliane House, Tadej Todorović, Tomaž Onič, David Hazemali, Katja Plemenitaš, and 
Donathan Brown. 2024. “The language of diplomatic mediation – A case study of an emergency 
meeting in the wake of the Yugoslav wars.” Language & Communication 96: 54–66.  
https://doi.org/10.1016/j.langcom.2024.02.004.
Kaplan, Jared, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott 
Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. “Scaling laws for neural language models.” 
arXiv. https://doi.org/10.48550/arXiv.2001.08361.
Kuhn, Harold William. 1955. “The Hungarian method for the assignment problem.” Naval Research 
Logistics Quarterly 2 (1–2): 83–97.
Liu, Shiyu, Juliane House, and Dániel Z. Kádár. 2024. “Bargaining in Chinese livestream sales events.” 
Discourse, Context & Media 60:100787. https://doi.org/10.1016/j.dcm.2024.100787.
Mercer, Sarah, Samuel Spillard, and Daniel P. Martin. 2025. “Brief analysis of DeepSeek R1 and its 
implications for generative AI.” arXiv. https://doi.org/10.48550/ARXIV.2502.02523.
Mohar, Tjaša, Sara Orthaber, and Tomaž Onič. 2020. “Machine translated Atwood: Utopia or dystopia?” 
ELOPE: English Language Overseas Perspectives and Enquiries 17 (1): 125–41.  
https://doi.org/10.4312/elope.17.1.125-141.
Monteiro, Mateus De Souza, Vinícius Carvalho Pereira, and Luciana Cardoso De Castro Salgado. 2023. 
“Investigating Politeness strategies in chatbots through the lens of conversation analysis.” In 
Proceedings of the XXII Brazilian Symposium on Human Factors in Computing Systems, Maceió, Brazil, 
1–12. Association for Computing Machinery. https://doi.org/10.1145/3638067.363 8068.
Moors, Sarah. 2025. “DeepSeek changes everything we thought we knew about building smart machines.” 
Digital Health Insights, January 29. https://www.dhinsights.org/news/deepseek-changes-everything-
we-thought-we-knew-about-building-smart-machines..
Onič, Tomaž. 2016. “Slogovne značilnosti … [premolk] … Pinterjevega dialoga.” Primerjalna književnost 
39 (2). https://ojs-gr.zrc-sazu.si/primerjalna_knjizevnost/article/view/6367.
Onič, Tomaž, and Nastja Prajnč Kacijan. 2020. “Repetition as a means of verbal and psychological violence 
in interrogation scenes from contemporary drama.” Ars & Humanitas 14 (1): 13–26.  
https://doi.org/10.4312/ars.14.1.13-26.
Pinter, Harold. 1991. Plays One. Faber & Faber.
34 Tadej Todorović, Andrej Flogie, Daniel Hari Generative AI in Pragmatics: Assessing the Accuracy of Automated Speech ...
Podbevšek, Katarina, and Nina Žavbi. 2021. “Jezikovna norma v luči odrske govorne estetike.” Jezik in 
Slovstvo 66 (2–3): 145–56. https://doi.org/10.4312/jis.66.2-3.145-156.
Raffel, Colin, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi 
Zhou, Wei Li, and Peter J. Liu. 2023. “Exploring the limits of transfer learning with a unified text-to-
text transformer.” arXiv. https://doi.org/10.48550/arXiv.1910.10683.
Rane, Nitin, Saurabh Choudhary, and Jayesh Rane. 2024. “Gemini versus ChatGPT: Applications, 
performance, architecture, capabilities, and implementation.” SSRN Scholarly Paper. Social Science 
Research Network. https://doi.org/10.2139/ssrn.4723687.
Schechner, Richard. 1966. “Puzzling Pinter.” The Tulane Drama Review 11 (2): 176–84.  
https://doi.org/10.2307/1125196.
Setlur, Vidya, and Melanie Tory. 2022. “How do you converse with an analytical chatbot? Revisiting 
Gricean maxims for designing analytical conversational behavior.” arXiv.  
https://doi.org/10.48550/ARXIV.2203.08420.
Stokel-Walker, Chris. 2023. “ChatGPT listed as author on research papers: Many scientists disapprove.” 
Nature 613 (7945): 620–21. https://doi.org/10.1038/d41586-023-00107-z.
Taguchi, Naoko, and Dániel Z. Kádár. 2025. “Pragmatics: An overview.” In The Encyclopedia of Applied 
Linguistics, edited by Carol A. Chapelle, 1st ed., 1–8. Wiley.  
https://doi.org/10.1002/9781405198431.wbeal1338.pub2.
Wang, Kevin, Junbo Li, Neel P. Bhatt, Yihan Xi, Qiang Liu, Ufuk Topcu, and Zhangyang Wang. 2024. 
“On the planning abilities of OpenAI’s O1 models: Feasibility, optimality, and generalizability.” arXiv. 
https://doi.org/10.48550/ARXIV.2409.19924.
Williams, Iwan, and Tim Bayne. 2024. “Chatting with bots: AI, speech acts, and the edge of assertion.” 
Inquiry: 1–24. https://doi.org/10.1080/0020174X.2024.2434874.
35LANGUAGE
Appraisal Analysis and AI Chatbots: 
Do We Even Need Humans?
ABSTRACT
Artificial intelligence (AI) is rapidly transforming various fields, including linguistics, by 
offering new tools for the analysis and generation of human language. As AI tools, particularly 
chatbots, have become increasingly sophisticated, questions have arisen about their capacity 
to replicate complex human linguistic processes, such as those covered by the appraisal 
framework developed by Martin and White (2005). The appraisal framework examines how 
three main categories – attitude, graduation, and engagement – are expressed in discourse 
at the semantic level. This paper investigates how AI chatbots, MS Copilot, ChatGPT, and 
Claude approach appraisal analysis in a selected text, highlighting similarities and notable 
differences in comparison to human analysis. The findings, although based on analysis of a 
single text, provide valuable insights into the advantages and drawbacks of AI in mimicking 
human-like appraisal analysis, which might be beneficial when conducting appraisal research. 
Keywords: appraisal, human and AI comparative analysis, ChatGPT, MS Copilot, Claude
Analiza jezika vrednotenja in pogovorni sistemi:  
ali ljudi sploh potrebujemo?
IZVLEČEK
Umetna inteligenca hitro preoblikuje različna področja, vključno z jezikoslovjem, tako da 
ponuja nova orodja za analizo in ustvarjanje človeškega jezika. Ker postajajo orodja umetne 
inteligence, zlasti pogovorni sistemi, vse bolj izpopolnjena, se pojavljajo vprašanja o njihovi 
sposobnosti ponovitve kompleksnih človeških jezikovnih procesov, kot so tisti zajeti v jeziku 
vrednotenja, ki sta ga razvila Martin in White (2005). Okvir jezika vrednotenja preučuje, 
kako se v diskurzu izražajo tri glavne kategorije – odnos, stopnjevanje odnosov in vključenost 
– na semantični stopnji. Članek raziskuje kako pogovorni sistemi, MS Copilot, ChatGPT in 
Claude pristopijo k analizi jezika vrednotenja v izbranem besedilu, tako da osvetli podobnosti 
kot tudi pomembne razlike skozi primerjavo s človeško analizo. Ugotovitve, čeprav temeljijo 
na enem izbranem besedilu, omogočijo dragoceni vpogled v prednosti in pomanjkljivosti 
umetne inteligence pri posnemanju človeške jezikovne analize, kar je lahko koristno pri 
raziskovanju jezika vrednotenja. 
Ključne besede: jezik vrednotenja, primerjalna človeška analiza in podprta z umetno 
inteligenco, ChatGPT, MS Copilot, Claude
2025, Vol. 22 (1), 35-52(228)
journals.uni-lj.si/elope
https://doi.org/10.4312/elope.22.1.35-52
UDC: 81:004.89
Agata Križan, Aja Barbič
University of Maribor, Slovenia
36 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans?
1 Introduction
The world has recently witnessed immense progress in the development of generative 
artificial intelligence (GenAI). A subset of AI – known as large language models (LLMs) 
– are machine models trained on vast amounts of data to understand and generate natural 
language and other types of content. This renders them capable of performing a diverse array 
of tasks, from writing essays and creating articles to answering questions and analysing texts, 
hence contributing to language research in a transformational way. Given their extensive 
data training and the utilisation of multilayer transformer-based neural networks, these 
AI tools can produce texts in which the language closely resembles that of humans, often 
mimicking human communication. For the purposes of this study, chatbots ChatGPT, MS 
Copilot, and Claude were used. ChatGPT was launched in 2022 by OpenAI, MS Copilot 
(MS short for Microsoft) was released in 2023, and Claude was developed by the research 
firm Anthropic and released in 2023. All three AI chatbots were trained to follow prompts 
to provide a response, are powered by AI, use up-to-date information, have conversational 
abilities, understand context, and possess broad knowledge.
Since one capacity of the AI chatbots is to perform complex analyses, the aim of this paper is 
to contrast the analysis of appraisal in a selected text as provided by ChatGPT, MS Copilot, 
and Claude with that provided by human analysts. This study is particularly pertinent, since 
while appraisal theory is widely researched and has been effectively applied to various texts 
and genres, qualitative research on the analysis of appraisal as generated by AI chatbots is 
almost non-existent.
ChatGPT has been applied in many research fields, including translation and language 
studies. Orel Kos (2024) examined the role of LLM-powered machine translation in subtitling 
instruction, revealing significant differences between students who relied on AI-generated 
translations and those who produced subtitles manually. The study highlights the challenges 
of multimodal awareness since post-editing AI-generated subtitles requires careful human 
intervention to ensure accuracy and contextual appropriateness. The first historical review 
of applications of ChatGPT in terms of its performance in various domains established that 
despite its many efficient applications, ChatGPT still has limitations (Shahriar and Hayawi 
2023), which will likely be improved with the development of new versions. In this review, 
ChatGPT’s responses to some of the researchers’ questions are analysed.
Furthermore, several studies have been conducted regarding ChatGPT’s applicability in 
researching language and language learning. Tica and Krsmanović (2024) explored student 
perceptions of ChatGPT in ESP (English for Specific Purposes) writing, revealing that while 
users appreciate its speed and accuracy, they remain divided on its overall effectiveness. By 
investigating the advantages of corpora and corpus tools over generative artificial intelligence 
in data-driven learning, Crosthwaite and Baisa (2023) highlighted advantages that corpora 
still hold over GenAI, such as knowledge of the data, authenticity, replicability, multimodality, 
safety, active learning, and absence of hallucinations, while GenAI has the potential to 
successfully address issues that corpus research has faced. For a more comprehensive 
understanding of language usage and patterns, as the authors argue, the combination of 
both tools is necessary. Uchida (2024) compared search results from ChatGPT and a large-
37LANGUAGE
scale general corpus (COCA), focusing on word frequency lists, collocations, identification 
of genres, and words fitting certain grammatical patterns. The quantitative results showed 
that ChatGPT successfully completed most of these tasks, i.e., it identified general linguistic 
trends and can thus effectively assist in language learning.
The study on ChatGPT by Curry, Baker, and Brookes (2024) shows that ChatGPT performs 
satisfactorily in the semantic categorisation of keywords, although the categories were 
mainly on the surface level, but fails in the analysis of concordances and function-to-form. 
Additionally, the study shows that it has (for now) certain limitations for more fine-grained 
corpus research and does not meet the standards of a human analyst. 
Imamović et al. (2024) assessed ChatGPT’s potential for annotating subcategories of attitude 
by using 11 Ted Talk texts and applying Martin and White’s (2005) appraisal theory. The 
results of the quantitative study show that ChatGPT was successful at identifying linguistic 
items in the text that carry evaluative meaning. However, the recall was very low, and detailed 
labelling with categories was incorrect compared to a human annotator.
Moreover, evaluation of the capabilities of AI chatbots, including Claude 2, in generating 
scholarly content within the humanities and archaeology done by Lozić and Štular (2023) has 
shown that that while LLMs have transformed content generation, their ability to produce 
original scientific contributions in the humanities remains limited. Research by Koeva (2024) 
revealed that LLMs, Claude 3.5 Sonnet, Gemini 1.5 Pro, GPT-4o and GPT-4o mini, could 
be of assistance in linguistic research, despite errors.
2 Appraisal
As an interpersonal and evaluative system, appraisal is concerned with the expression of the 
writer’s and speaker’s attitudes and emotions towards propositions, as well as with positions 
towards communicative events and other voices. It is thus concerned with evaluation in 
written and spoken discourse. According to Hunston and Thompson, evaluation is defined 
as “the broad cover term for the expression of speaker or writer’s attitude or stance towards, 
viewpoint on, feelings about the entities or propositions that he or she is talking (or writing) 
about” (1999, 5). Functions of evaluation include expression of the speaker’s or writer’s 
opinion, hence a reflection of the value system, construction, and maintenance of relations 
between the participants in a written or spoken event, and organisation of the discourse 
(ibid.). Despite its evaluative nature, the term ‘appraisal’ is used to emphasise its discourse-
semantic aspect (Martin and White 2005).
Appraisal systematically covers three domains at the level of discourse semantics, encompassing 
attitude, graduation, and engagement. The domain of attitude is further divided into affect, 
which deals with language that expresses emotions (e.g., anxious), judgement, dealing with 
language evaluating people’s behaviour and character (e.g., clever), and appreciation, addressing 
language aesthetically evaluating things, objects, events, and phenomena (e.g., unique). 
According to Martin and White (2005), attitude can be inscribed (explicitly/overtly 
expressed), i.e., encoded in attitudinal lexis, or evoked (implicitly/covertly expressed), i.e., 
implied via ideational meanings and/or co(n)text. Attitude can be positive or negative.
38 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans?
Graduation is concerned with the gradability of attitudes. It adjusts the degree of an evaluation 
(force, e.g., very popular) or the strength of boundaries (focus, e.g., true happiness) (ibid., 37). 
Engagement deals with the dis/alignment of writers’/speakers’ positions and voices with those 
referenced in the text and by other voices (e.g., seem) (ibid., 34-35). 
From the perspective of functional linguistics (SFL), appraisal construes the interpersonal 
metafunction, which is concerned with how participants interact with one another, influence 
the behaviour of others, construct and fill social roles, adopt attitudinal and evaluative 
positions, and form bonds, relationships, and alliances (White 2000, 4). In other words, 
the interpersonal metafunction of language is defined as “language as action” (Halliday and 
Matthiessen 2004, 30). 
The appraisal framework, as developed by Martin and White (2005), has proven useful 
in systematically revealing the (inter)connection of all subsystems of appraisal language 
in various texts and genres, contributing to the understanding not only of the evaluative 
component of texts but also of the social one, as well as to the understanding of how and why 
texts mean what they do. With the development and capacity of AI, particularly chatbots, 
appraisal can undoubtedly be analysed by AI. The question is simply how successfully and in 
which manner – independently or with human assistance.
3 Methodology
This article explores the potential of ChatGPT, MS Copilot, and Claude to identify/annotate 
instances of appraisal in a selected text, using the appraisal system. It highlights discrepancies 
and similarities between the three and contrasts them with an analysis performed by a human. 
Linguistic annotation is vital for a sophisticated exploration of language by providing insights 
into language use. Annotation can be used at various levels, including phonetic, prosodic, 
grammatical, semantic, and pragmatic/discursive (Leech 1993). 
For the purposes of this analysis, which took place between November 22, 2024 and February 
12, 2025, a random text was chosen, yet certain prerequisites were considered: the article had 
to be from a serious newspaper, in a current issue, of average length, available online, and 
containing at least some evaluative language. It was selected from the globally renowned 
British daily newspaper The Guardian and addresses UK universities asking the government 
to restart the flow of EU students to Britain after Brexit and a return to the Erasmus student 
exchange programme. The total number of words in the article is 2,496. Freely accessible AI 
chatbots were used.
The instances of appraisal in the text were identified by two human analysts proficient in 
appraisal theory (i.e., the authors of this paper, hereafter referred to as human annotators1). The 
double coding increased objectivity, and the annotation included tags for affect, judgement, 
and appreciation (subcategories of attitude), explicit/implicit (attitudinal realisation), positive/
negative (attitudinal status), graduation, and engagement. After independent coding, the 
1 In this study, the ‘human annotator’ analyses (identifies/annotates) appraisals according to Martin and White’s 
appraisal typology, while the ‘user’ is a human using AI for the appraisal analysis via prompting.
39LANGUAGE
annotations were compared. When the coding differed or an appraisal was not identified at 
all, the case was discussed according to Martin and White’s appraisal typology, the analysts’ 
knowledge of appraisal theory and experience of coding, and co(n)text. Where necessary, 
a dictionary was used to check the definition, and where possible, double and even multi-
coding of appraisals was accepted for the sake of accuracy and greater objectivity.
For communication with chatbots, the users employed prompt engineering (i.e., carefully 
creating instructions and questions for chatbots). The human annotators decided upon 
the use of initial prompts that focused solely on the appraisal (analysis) of the given text. 
The number of prompts for the first chatbot (MS Copilot) was 40, while for the other two 
it varied slightly, depending on the responses and thus on subsequent prompts. The first 
prompts targeted the chatbots’ knowledge of and familiarity with appraisal theory, instructing 
them to analyse/identify appraisals in the entire text. Subsequent prompts depended on the 
preceding responses. First, the prompting was performed for MS Copilot until sufficient 
data was gathered, then the same or similar prompts, depending on responses,2 were used for 
ChatGPT and Claude. After prompting the chatbots to define the appraisal theory, they were 
asked to analyse the text in terms of appraisals. Since no comprehensive analysis was provided 
(e.g., lack of implicit attitudes, clarity, or systematic exemplification), additional, more 
specific prompts or clarification requests were used. Next, human annotators divided the 
text into multiple parts and asked chatbots to analyse appraisals in these shortened parts. If a 
chatbot provided a comprehensible answer, no additional prompt was needed. For example, 
one of the prompts asked about implicit attitudes in the text, and both ChatGPT and Claude 
answered that graduation and engagement could be expressed implicitly, even though only 
attitudes can be explicit or implicit. An additional prompt was thus used to ask both chatbots 
about the source of such information and explanation. Another prompt example asked the 
chatbots to identify engagement in a sentence which, based on Martin and White’s appraisal 
theory, used an engaging item. ChatGPT and Claude categorised this item as such, whereas 
MS Copilot did not recognise it as appraisal and thus an additional prompt was used to ask 
this chatbot specifically about this item. Here are some examples of prompts:
What is appraisal theory as developed by Martin J. R. and White P. R. R.?
Identify and categorise appraisals based on Martin and White’s appraisal theory.
Analyse the given text in terms of appraisals.
Please analyse in more detail the following sentence. (the sentence was provided)
What about ‘toxic’, isn’t ‘toxic’ attitudinal? (referring to the AI tool’s previous response)
Analyse in more detail. (referring to the AI tool’s previous response) 
Can ‘chief executive Viviene Stern’ be identified as judgement targeting the responsible 
position she holds, as well as graduation? 
2 There was, for example, no need to use the subsequent prompt asking a particular chatbot for the alternative coding 
when the coding matched the human annotators’, even though a prompt asking for the alternative coding was used 
for another chatbot because its coding did not match the human annotators’.
40 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans?
Once the prompting was finished for all three chatbots, each response and every coded 
instance provided by the chatbots were carefully compared with the human analysis, as well 
as between all three chatbots in terms of discrepancies and similarities. Given the brevity of 
this paper, only some discrepancies and similarities are highlighted and illustrated. For the 
purposes of analysis and comparison, a qualitative research method was used.
The methodology in this study is similar to that applied by Hazemali et al. (2024), where 
structured questioning was used to evaluate the GPT-3.5 powered PDFGear Copilot’s ability 
to interpret historical documents. Their study found that while the chatbot performed well in 
factual retrieval, it struggled with deeper content interpretation. Given that appraisal analysis 
also requires a nuanced understanding of context and evaluative meaning, similar challenges 
were expected to arise when AI tools were used in this domain. 
4 Comparative Analysis and Findings 
The initial prompts revealed that ChatGPT and MS Copilot prioritised extended phrases 
rather than discrete instances of appraisal, whereas the human annotators and Claude 
concentrated on analysing individual instances of appraisal. Only after additional prompting 
did ChatGPT and MS Copilot begin to highlight individual instances, particularly in 
relation to graduation and engagement. The responses generated by ChatGPT and Claude 
were typically more elaborate and specific than those by MS Copilot, providing a summary 
at the end and highlighting the main points. All three chatbots organised their analyses 
systematically, arranging responses around distinct categories. Comprehensive and nuanced 
analyses from the AI tools, particularly from ChatGPT and MS Copilot, frequently required 
additional prompting that suggested alternative coding or coding of certain instances absent 
from the AI analysis or asked for clarification. In comparison, Claude needed less additional 
prompting. Surprisingly, the chatbots identified fewer appraisals than the human annotators, 
especially when dealing with the whole text. A possible explanation is that only some examples 
were listed; however, in responding to one prompt, MS Copilot stated that those were the 
examples. If the listed appraisals were examples only, this could be perceived as a disadvantage 
as this demanded not only additional prompts but also carefully structured ones. If the listed 
examples were all identified appraisals, this could also be perceived as a disadvantage as the 
number of identified appraisals was mostly much lower than that identified by the human 
annotators. 
The initial prompts assessing the AI chatbots’ familiarity with appraisal theory and requesting 
appraisal analysis of the selected text revealed some discrepancies, primarily concerning 
the length and structure of responses. Claude provided a lengthy and detailed overview 
of appraisal framework. Interestingly, graduation and engagement, two major categories 
alongside attitude, were absent from MS Copilot’s appraisal analysis but were included in 
ChatGPT’s and Claude’s. 
For the sake of clarity, a subsequent prompt was more specific, inviting the chatbots to 
identify and categorise (instead of analysing which was used in the initial prompt) appraisals 
based on Martin and White’s model (2005), and specifically demanding the analysis of 
explicit, implicit, positive and negative attitudes, as well as attitudinal targets (emoters for 
41LANGUAGE
affect) instead of simply attitudes. Some notable discrepancies in attitudinal realisation, 
categorisation, and attitudinal status were exhibited between ChatGPT, MS Copilot, Claude, 
and the human annotators, as exemplified in (1–3). 
(1) UK universities …, but hopeful amid talks on youth mobility
(2) We also get a tiny bit uncomfortable 
(3) British universities say they … are adopting a “watch and wait” approach
In (1), hopeful was categorised by MS Copilot as implicit negative affect, whereas the human 
annotators coded it as explicit positive affect (universities’ hope and optimism). The implicit 
realisation of the attitude seemed problematic since hopeful is clearly attitudinal lexis. 
Similarly, MS Copilot coded uncomfortable in (2) as implicit negative affect, while the human 
annotators coded it as explicit negative affect since the word conveys the feeling of unease 
directly rather than indirectly. Claude’s analysis did not include hopeful, although this was 
clearly attitudinal, but included uncomfortable as explicit negative affect. However, when the 
analysis focused solely on the sentence, Claude identified hopeful as positive appreciation, 
although dictionaries define it as a feeling. In (3), watch and wait was identified as implicit 
negative affect by MS Copilot (in the sense of monitoring without any action), whereas 
human annotators coded it as implicit positive judgement, targeting the British universities’ 
behaviour based on their cautious and patient approach to avoid undesirable political 
conflict. Since this instance was not identified as appraisal by Claude initially, it identified it 
as implicit negative affect after additional prompting focussing on implicit attitudes and their 
categorisation. Interestingly, before a prompt asking for the categorisation of the implicit 
watch and wait, the phrase was identified in terms of implicitness as an underlying urgency, 
which was unclear. This exemplifies the difference in attitudinal categorisation and status. 
While ChatGPT and MS Copilot did not identify good in good students as appraisal, the human 
annotators and Claude did. Although the human annotators categorised good as judgement 
based on the obvious target being students’ capability, Claude categorised it initially as 
appreciation, despite its identification of European students as the target. After a subsequent 
prompt, Claude finally identified good students as explicit positive judgement. Claude 
(after a more specific prompt mentioned above) and the human annotators also identified 
disproportionately in burden rested disproportionately as implicit negative appreciation, whereby 
Claude just listed it among implicit attitudes together with the target, whereas the human 
annotators identified the implicitness based on a prior identification of disproportionately as 
graduation. 
(4) It was absolutely fantastic that youth and students were ‘central’ to the discussion 
about the reset in relations with the EU. 
Although ChatGPT and MS Copilot provided examples in full sentences or longer phrases 
when invited to analyse the entire text, both chatbots highlighted only fantastic as explicit 
appreciation in (4). In the analysis of the entire text, all three chatbots, and the human 
annotators identified absolutely fantastic as appreciation, whereas later, when asked about 
42 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans?
implicit attitudes, MS Copilot identified it as implicit affect conveying optimism and hope. 
When the analysis focused solely on sentence (4), ChatGPT and MS Copilot identified 
absolutely fantastic as affect in the sense of enthusiasm and approval, with ChatGPT, 
interestingly, identifying it as explicit affect. Claude, in contrast, identified the phrase as 
positive appreciation in the first place. Based on Martin and White’s (2005, 56) examples of 
appreciation, fantastic is a positive reaction to the recognition of youth and students in the 
discussion rather than an emotion that someone feels. 
The human annotators and Claude also identified central as appreciation in the sense of 
primary importance as well as graduation and engagement. However, Claude’s identification 
of appreciation and engagement occurred when the individual sentence using this word was 
the focus. Only after additional prompting, which was not specifically targeted at central, 
did one of ChatGPT’s analyses include central as positive appreciation and graduation. 
Additionally, the human annotators coded it as implicit positive judgement targeting 
politicians for giving it primary attention during the meeting.
(5) It’s not in our interest for the government to end up caught in a kind of toxic 
debate about immigration domestically. 
Regarding attitudinal realisation, toxic in (5) was identified by the human annotators as 
explicit negative appreciation targeting the debate, whereas Claude categorised toxic debate 
as implicit negative appreciation of political discourse. Despite its obvious attitudinal 
significance, neither ChatGPT nor MS Copilot listed this as an example of attitude. 
Although ChatGPT’s explanation of implicit attitudes as “attitudes [that] are often subtle 
and rely on the reader’s interpretation of what is implied rather than explicitly declared” 
is valid, questions arise regarding whose interpretation (voice) is involved in the analysis 
of implicit attitudes, given that chatbots gather information from the internet and likely 
from analyses of appraisals conducted by various human analysts in a variety of contexts. For 
example, the status of the same word may vary dependent on the context. However, since 
each text is written with an ‘ideal’ reader in mind (Kress 1988, 107), which is a description 
of the reading position to which the actual reader is invited to conform (Macken-Horarik 
2003), it is possible that the reading position of the actual reader does not match that of 
the ideal reader. This may also happen with analysts, and this misalignment may impact 
the identification of implicit attitudes. Consequently, the identification of implicit attitudes 
may thus vary between a human annotator and AI, and between a human annotator and an 
author. To minimise subjectivity in the analysis, double/multi coding is necessary to exhaust 
the possible multiple interpretations (Page 2003). Examining hints, such as graduation and 
engagement (if used), in addition to contextual knowledge, may be helpful in identifying 
implicit attitudes. ChatGPT, however, when specifically asked about the presence of implicit 
attitudes after the initial analysis, listed some that were not identified by the human annotators. 
While this may suggest a more fine-grained analysis than that of its human counterpart, this 
may not necessarily be the case. Some implicit attitudes were identified individually to fit 
the overall pattern of evaluation across the text, which was the positive evaluation of the pre-
Brexit youth and student exchange programmes that UK universities would like to reinstate, 
whereas the human annotators focused more on wording, as suggested by Thompson (2008).
43LANGUAGE
Interestingly, regret in we really, really regret the fact was identified as an implicit attitude 
by ChatGPT and MS Copilot, whereas the human annotators and Claude coded it as an 
explicit attitude by considering it as attitudinal lexis3. After challenging ChatGPT and MS 
Copilot with the prompt reminding them about their preceding classification, both accepted 
the coding by praising the human annotators’ coding skills. ChatGPT even provided the 
reason for its explicitness, but still insisted on the presence of implicit affect, explaining 
rather opaquely that “… the text subtly implies dissatisfaction with these outcomes while 
maintaining neutrality.” 
Certain discrepancies were also observed with regard to graduation and engagement. The 
issue of dealing with whole sentences or phrases was noticeable again when MS Copilot 
identified the phrase flow of really good European students as graduation, whereas the human 
annotators and Claude highlighted the individual item flow of and really, which is more 
precise. Interestingly, MS Copilot identified extremely in extremely important as focus, 
although it upgrades explicit attitude (appreciation), and did not recognise the repetition of 
the intensifier really in really, really regret as graduation, unlike the human annotators, Claude, 
and ChatGPT. For an accurate quantitative and qualitative analysis, such data are indubitably 
indispensable. 
(6) EU data for 2020 shows that 17,795 students came to the UK in 2018/2019, 
almost double the number of British students, 9,908, that went to the EU. The 
previous year, 18,839 EU citizens came to British universities compared with 9,540 
going to the EU. 
Moreover, in (6), MS Copilot provided an unusual explanation that the phrase almost double 
the number quantified the comparison, as comparison of attitudes is typically regarded as 
a source of intensification. How can comparison be quantified? In a prompt requesting 
the analysis of appraisals, MS Copilot identified the phrase almost double the number 
as engagement, referring to it as comparison in the explanation, as well as graduation 
(intensification), whereas ChatGPT identified it as graduation via quantification and 
engagement. While the phrase unequivocally conveys graduation via quantification (amount), 
as rightly recognised by the human annotators and Claude, and based on the co-text, also 
via comparison (intensification), its identification as engagement (contrast) by ChatGPT 
seemed ambiguous. Although it may imply contrast based on the co-text, engagement in 
the appraisal framework, including counter-expectancy (e.g., although, despite, still), is 
primarily expressed via grammatical elements, if counter-expectancy was meant by contrast, 
as claimed by ChatGPT. The identification of the above phrase solely as graduation seems 
to reflect greater awareness of engagement and its grammatical realisation by the human 
annotators and Claude than by ChatGPT and MS Copilot. Claude, for example, did not 
identify possible in possible return to the Erasmus student exchange programme as engagement, 
as the human annotators did based on modality, but as graduation, which is also reasonable 
when regarded as a downgraded version of certainty, which the human annotators should 
3 ‘Regret’ is defined as a feeling of sadness, distress, and/or disappointment (https://dictionary.cambridge.org/
dictionary/english/regret, https://www.merriam-webster.com/dictionary/regret, https://www.collinsdictionary.
com/dictionary/english/regret)
44 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans?
have taken into consideration. However, when later engagement in the individual sentence 
was the focus of analysis, Claude identified it as engagement under modality, which was one 
of the Claude’s listed sources of engagement. These types of examples illustrate that responses 
provided by chatbots should be carefully studied prior to use in appraisal research. 
Furthermore, in the initial analysis, only Claude and the human annotators identified data 
as graduation in (6). Additionally, the human annotators identified it as graduation, which 
is used as a mechanism contributing to the identification of implicit negative appreciation 
targeting the (disproportional) exchange situation for the British students. Similarly, 
Claude identified specific numbers as an implication of the scale of loss, which may also be 
interpreted as an attitudinal trigger. When ChatGPT and MS Copilot were challenged with 
a specific prompt asking about the coding of the issue relating to the inflow and outflow of 
exchange students as implicit negative appreciation because they did not identify any implicit 
attitudes, they both accepted such coding. In contrast, there was no need for another prompt 
for Claude since it identified an implicit attitude instantly. 
Interestingly, ChatGPT regarded engagement and graduation as implicit, providing an 
unusual or unclear explanation that engagement and graduation can carry implicit elements, 
and thus identifying central in (4) as a non-explicit intensifier. The response to the subsequent 
prompt referred to the knowledge of appraisal, highlighting the work of Martin and White’s 
book The Language of Evaluation: Appraisal in English (2005). However, according to this 
source (2005, 131–32), in reference to engagement, only pronouncement (as a subcategory 
of engagement) can be realised as explicit or implicit, whereas such realisation does not occur 
with graduation. 
Since for a more accurate and fine-grained appraisal analysis non-attitudinal graduation as a 
potential trigger of certain attitudes in specific contexts (Hood 2004) should be also taken into 
consideration4, one of the prompts specifically requested the identification of non-attitudinal 
graders in the whole text. MS Copilot included really, absolutely, and tiny bit as examples 
of non-attitudinal graders, although they clearly intensified explicit affect and appreciation. 
Furthermore, toxic in toxic Brexit row was identified by ChatGPT as non-graduation when 
focussed solely on the sentence using this phrase, although it is clearly an intensified negative 
attitude, as rightly identified by the human annotators and Claude.
In the selected article, the material is frequently attributed to external sources connected to 
politics and universities. The credibility of these resources is occasionally signalled by the 
important responsible positions they hold, which was identified as graduation (e.g., specificity 
or lexically infused intensification) with attitude-evoking potential by the human annotators, 
such as, for example, chief executive, which may imply positive judgement (the same for 
European Commission president). When all three AI tools were asked about such coding, 
they agreed with it, with ChatGPT and MS Copilot referring to it as social esteem and 
capability. ChatGPT even specified that judgement was implicit, stating that the given title 
was “implicitly judging her as someone with responsibility, credibility, and a mandate to speak 
on behalf of Universities UK.” Additionally, Claude identified it also as graduation, like the 
4 Graduation, for example, has a potential to evoke certain attitudes and values in advertising (Križan 2016).
45LANGUAGE
human annotators. This shows again that human assistance is often needed to obtain a more 
accurate and clear-cut final appraisal analysis. Furthermore, by accepting the coding decisions 
provided or suggested by the human annotators, it appeared that all three AI tools did learn 
from subsequent prompting and included this knowledge in their subsequent analyses.
When, after additional prompting, MS Copilot included graduation and engagement in the 
analysis of appraisals, various elements were excluded from the analysis: counter-expectancy 
but, denial not, prefixes dis- and un-, the quoted material correction mechanism/central, 
reporting verbs believes/says/added/expected/idea of, modality might/would/would have had to/
seemed/possible, and because as reason. Likewise, ChatGPT did not provide the above listed 
elements as examples of engagement. However, contrary to MS Copilot, ChatGPT referred 
to individual graders in explanations provided next to the whole propositions (examples). 
Although denials, including the above-mentioned prefixes, were also excluded from Claude’s 
initial analysis of engagement, its analysis included the reporting words believes, says, and shows.
After dealing with the whole text, subsequent prompts focused on the identification of 
appraisals in individual sentences instead of in the whole text to observe any difference in 
responses pertaining to text length. What was noticeable immediately were highlighted 
individual words or short phrases, which were often absent from the analysis of the entire text.
(7) But as Keir Starmer prepares for his first bilateral meeting with the European 
Commission president, Ursula von der Leyen, on Wednesday, British universities say 
they are determined not to provoke a return to the ‘toxic’ Brexit row migration and 
are adopting a ‘watch and wait’.
Moreover, both MS Copilot and the human annotators identified the phrase determined not 
to provoke as positive judgement in (7). However, MS Copilot also identified it as engagement, 
although it was unclear what exactly this engagement referred to: the determination, 
provoking, or the use of the denial closing space for alternative views. The human annotators 
also identified the denial not as engagement, but not determined as graduation, like Claude 
and MS Copilot, which may be regarded as such if unpacked as decision + firm. As the 
analysis showed, there were cases where the AI tools accepted a human coding decision or 
suggestion, as well as cases where the opposite occurred. Although deciding if a word is 
semantically infused with intensification can be difficult, the unpacking of a word into ___ 
+ more, as well the use of dictionaries, was helpful in many cases. Since dictionaries are 
part of the internet where MS Copilot searches for information, it seemed obvious why 
it recognised determined as graduation. Based on the intensified decision, paired with the 
denial of provoking, the human annotators coded the phrase determined not to provoke as 
judgement, as did MS Copilot and Claude, whereas ChatGPT identified it as implicit affect. 
However, intense determination not to do something harmful points to positive tenacity/
propriety (judgement) rather than feelings. Additionally, the human annotators and Claude 
identified but in (7) and (8) as engagement. Such coding was also accepted by ChatGPT and 
MS Copilot following a more specific prompt after it was absent from their analysis. This 
was surprising because but points to the author’s strong presence in the text. In contrast to all 
three chatbots, the human annotators also identified the fact as engagement in (8). 
46 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans?
(8) We really, really regret the fact we have lost a flow of really good European students 
into the UK” said the chief executive of Universities UK, Vivienne Stern. But she said 
she recognised the “toxic” domestic politics surrounding the prospect of EU citizens 
returns at scale to education in the UK. 
(9) “We also get a tiny bit uncomfortable when you think that something which is 
extremely important to us might be bound up in big politics.”
Moreover, the phrase we also get a tiny bit uncomfortable in (9) was regarded as engagement by 
MS Copilot, which explained it in an unclear way, i.e., that it “acknowledge[s] the speaker’s 
feelings and allowing for other viewpoints”. What is meant by the acknowledgment of feelings? 
Does this refer to the alignment of feelings between the author of the text and the external 
source, hence opening up space for alternative views? The human annotators identified 
the whole proposition as material attributed to the external source. Despite MS Copilot’s 
claim that the analysis was detailed, this could not be viewed as such, since certain appraisals 
identified by the human annotators, such as positive appreciation (important) and engagement 
(might), were absent from MS Copilot’s analysis. Claude identified might as hedging in the 
initial analysis, listing hedging next to force and focus under graduation, which was unusual 
since attitudes can only be graded in force or focus. It is true that hedges can be used to 
express degrees of certainty and uncertainty, but this is categorised as engagement (entertain) 
by Martin and White (2005, 98). The human annotators identified might as engagement. 
However, when the analysis focused on individual sentence (9), Claude identified might as 
engagement. Furthermore, the human annotators identified also as conveying addition as 
graduation in the sense of upgrading the negative emotions felt around the issue of Erasmus 
exchanges. Moreover, Claude identified recognised as graduation and positive judgement of 
Stern’s diplomatic stance in (8), which the human annotators, ChatGPT, and MS Copilot 
overlooked. While such coding is certainly reasonable, it can only be implicit judgement, 
which Claude did not state overtly (in an example before recognised, Claude used implies to 
signal implicitness). Unlike the human annotators, none of the AI tools identified the prefix 
-un as engagement (not comfortable = denial).
(10) “It’s not in our interest for the government to end up caught in a kind of toxic 
debate about immigration domestically, because in the end that is going to hurt us 
badly if it drives government to be clamping down on immigration in other ways,” 
she said. 
While ChatGPT, MS Copilot, Claude, and the human annotators identified the phrase hurt 
us badly in (10) as negative affect based on feelings that universities will experience if the 
debate forces the government to stop immigration in other ways, Claude also identified it 
as negative judgement targeting “the potential governmental consequences” and thus the 
government, which also seems a reasonable coding. Moreover, MS Copilot identified the 
phrase toxic debate as negative judgement, like Claude in the initial analysis, evaluating the 
debate (harmful and undesirable), whereas the human annotators evaluated it as negative 
appreciation targeting the debate as an inanimate entity and initial target. Finding the initial 
target can be an essential element in ascribing categories (Thompson 2014, 58). Although it is 
obvious that the toxic debate was produced by politicians, and thus connected to behaviour, 
47LANGUAGE
judgement is implied rather than inscribed. In (10), the human annotators also identified 
other appraisals that were absent from MS Copilot’s analysis, such as about, kind of, and other 
ways as graduation, and because, not, and if as engagement, with the latter two identified as 
such also by ChatGPT. Claude, like the human annotators, identified (about) immigration 
as graduation in terms of specificity. Interestingly, kind of was identified as engagement by 
Claude and ChatGPT, although describing its softening characterisation. Moreover, when 
asked specifically about the engaging nature of because, all three chatbots accepted such 
coding. 
Moreover, with ChatGPT’s and Claude’s identification of domestically as graduation as 
focus, specificity as a source of graduation was likely acknowledged. Additionally, the whole 
attributed material was identified as engagement by the human annotators, whereas MS 
Copilot identified only the phrase it’s not in our interest within the attributed material as 
engagement. MS Copilot’s explanation that the phrase expresses certainty and thus closes the 
dialogic space for alternative interpretations is unclear as to what certainty here means. The 
denial not in the phrase does show the speaker’s engagement, but the reference to certainty 
remained unclear. 
(11) Speaking in New York on Friday, Starmer seemed to have softened his resistance 
to the idea of a youth mobility scheme allowing under-30s to return to the EU for 
working holiday stints.
In (11), seemed to as an important engagement element opening up space for alternative views 
was absent from ChatGPT’s and MS Copilot’s analyses, but not from the human annotators’ 
and Claude’s. The Russian doll effect (Thompson 2014) was observed in Claude’s identification 
of softened his resistance as both explicit positive judgement and implicit negative judgement 
targeting political flexibility and previous rigid stance, respectively. Additionally, Claude, like 
the human annotators, identified under 30s and working holiday as graduation, but not EU, 
as the human annotators did based on location. Claude and the human annotators, unlike 
ChatGPT and MS Copilot, also identified softened as graduation, as it clearly downgrades 
the intensity lexically. The human annotators’ coding of idea as engagement was absent from 
ChatGPT’s, MS Copilot’s, and Claude’s analyses, although it clearly introduces an external 
source. Interestingly, Claude identified the idea in another sentence as engagement, although 
both cases indicate the attributed material to the external source.
Moreover, Claude’s identification of youth mobility scheme as positive appreciation seems fuzzy 
since it does not clearly indicate its implicit realisation as it occasionally does in brackets or by 
using lexis that expresses implication. 
(12) UK universities urge government to restart flow of EU students after Brexit 
Universities  
In (12), the human annotators coded urge as positive affect next to engagement because of its 
likely connection with the desire that UK universities have for the reinstatement of student 
exchanges, whereas ChatGPT, MS Copilot, and Claude did not code it as such. Furthermore, 
MS Copilot immediately refuted such coding when challenged via a subsequent prompt, 
48 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans?
whereas ChatGPT agreed with it. Claude identified it as “positive judgment of universities’ 
proactive stance,” which may be reasonable, but more implicit than explicit in nature. Since 
no indication of implicitness is provided by Claude, as mentioned above, such attitudes 
should be regarded as explicit by users.  According to deVore (1949), urge can be connected 
to emotions when it is stimulated by some contact within an environment, and the UK 
universities are stimulated in terms of wanting to attract as many EU students as possible. 
Since the appraisal system relies heavily on context, the evidence of the annotators’ exactness 
can be clearly seen in this example, while a failure to give precise and direct answers can be 
detected in the AI response. ChatGPT’s reply was even more human-like, acknowledging 
the thoughtful question by the annotators. However, surprisingly, when MS Copilot was 
prompted again to explain why urge could not imply affect in terms of wish or desire, it agreed 
with such coding. If MS Copilot had not been asked for clarification, the user would have 
accepted the first analysis, missing out on other possible interpretations and coding, which 
would have diminished the thoroughness of the analysis. When asked if urge was explicit 
or implicit affect, it said it was explicit, hence making this coding compatible with that by 
the human annotators. In contrast, ChatGPT acknowledged its implicit realisation. When 
Claude was asked if urge could be affect, it still insisted on judgement as a better choice than 
affect, providing an explanation via contrasting both categories. However, after subsequent 
prompts clarifying the meaning of urge as wish or desire, Claude finally accepted its coding 
as implicit affect. Interestingly, when the same prompt was used again later, Claude identified 
urge as explicit affect (the same happened with hopeful). This may be a good illustration of 
the AI’s learning nature.
Furthermore, the AI tools’ identification of urge as graduation, which the human annotators 
overlooked, was reasonable since it can be unpacked as desire + strong. On the other 
hand, certain instances were identified as graduation, such as after Brexit, UK, and EU 
(quantification as time and extent), by the human annotators, but not by MS Copilot. 
Claude identified EU students as graduation in terms of specificity, which signals the broader 
knowledge of graduation. MS Copilot also coded restart as graduation via quantification, 
whereas the human annotators coded it as graduation via intensification because of the 
repetition conveyed via the prefix re-. Although Claude also identified restart as graduation, 
its explanation that it “precisely frames the desired action” is opaque. It was further observed 
that ChatGPT referred to the phrase restart flow of EU students after Brexit as an implicit 
attitude, but listed it as an explicit one, which was extremely confusing. 
(13) Up to now, most of the focus on reviving post-Brexit opportunities for young 
people has been focused on an EU proposal in April for a youth mobility scheme 
that would allow under-30s to study or work abroad for a limited number of years. 
In (13), the human annotators identified the same graders as all three AI tools (e.g., limited 
number of years, most of), adding also post-Brexit, focused, focus, and abroad as graduation 
(quantification via time and place, and focus specificity). While MS Copilot identified 
in April as graduation via specificity, the human annotators identified it as graduation 
via quantification (time). Interestingly, MS Copilot and Claude identified up to now as 
engagement, although it was clearly graduation (quantification as time/extent). Moreover, 
49LANGUAGE
Claude identified reviving as implicit negative appreciation targeting the current state 
that needs a revival, as well as implicit positive appreciation targeting “potential future 
opportunities,” which is unclear, whereas the human annotators coded it as explicit positive 
appreciation based on its denotational meaning to bring something back to life. 
(14) But as Keir Starmer prepares for his first bilateral meeting with the European 
Commission president, Ursula von der Leyen, on Wednesday, […]
In contrast to all three chatbots, the human annotators identified implied positive judgement 
targeting Ursula von der Leyen’s position in (14). Furthermore, upon a plausible explanation 
by ChatGPT, pointing to Starmer’s diligence and readiness indirectly via his preparation, the 
coding of this proposition as implicit positive judgement became obvious. Claude, on the 
other hand, coded bilateral as implicit positive appreciation based on the political discourse 
(context) in which such meeting is of significance and carries diplomatic weight. With this, 
Claude showcased strong awareness of context. Moreover, since the European Commission 
president, Ursula von der Leyen was identified as non-attitudinal by MS Copilot, why did it 
categorise it as explicit appreciation? Since Ursula von der Leyen’s role was identified as the 
target of evaluation by MS Copilot, then could it also have been identified as judgement? 
Unfortunately, the user did not ask for clarification, which points again to the problem of 
obtaining sufficient prompting, which may not only be time-consuming but may also require 
elevated levels of creativity and exactness in forming prompts. 
5 Conclusion
The paper compares the analysis of appraisal performed by ChatGPT, MS Copilot, 
and Claude to that performed by humans in a selected newspaper article. By applying a 
qualitative research method, the paper provides important insight into differences and 
similarities in the identification and annotation of appraisals based on the systematic and 
fully developed evaluative model by Martin and White (2005). While humans rely on 
context and subjective/human experience alongside knowledge of appraisal theory to analyse 
the text in terms of appraisals, AI models depend on pre-trained datasets to approximate 
these functions. ChatGPT’s responses can be extensive, yet they often lack grounding in 
facts, making it obvious that repetition of previous answers or sentences occurs. MS Copilot’s 
responses can be less conversational, focusing more on providing straightforward answers, 
although occasionally even more conversational than ChatGPT. ChatGPT and MS Copilot 
generally selected broader phrases for analysis rather than individual appraisal instances, 
without highlighting the explicit element responsible for evaluation, which was occasionally 
confusing. In contrast, Claude focused on individual instances.
All three chatbots showcased adaptability when prompted for clarification, often accepting 
coding suggestions and refining their analysis accordingly. ChatGPT provided similar answers 
to MS Copilot, yet after the generated answer it added systematic summaries (sometimes 
lengthy and repetitive) offering a useful structured reflection of the analysis. On the other hand, 
this tendency to prolong the answer could be interpreted as an attempt to make the analysis 
seem precise. All three chatbots provided better answers when dealing with smaller sections 
of text, although even in such cases subsequent prompting was often necessary. Additionally, 
50 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans?
consistency was occasionally problematic after subsequent prompting, which is in line with 
the research by Imamović et al. (2024). Moreover, since certain explanations next to examples 
were ambiguous, a closer (and time-consuming) examination of these was required.
All three AI tools demonstrated certain discrepancies in categorizing attitude, graduation, 
and engagement, with ChatGPT and Claude showing greater alignment with the human 
annotators in identifying implicit attitudes. Additionally, discrepancies were also observed 
in the identification of attitudinal status and realisation. All three chatbots also often 
lacked consistency in identifying implicit versus explicit attitudes, often requiring human 
intervention for accuracy. Based on ChatGPT’s answers, however, the authors noticed a 
recurring pattern. If ChatGPT provided information about a phrase or a sentence being 
expressed implicitly or explicitly, it usually did not provide information about whether the 
phrase or sentence was expressed positively or negatively, which was also noticed in some 
responses from Claude. 
The human annotators identified more appraisal instances overall when dealing with the 
whole text, often pinpointing nuances that were overlooked by the AI tools, such as specific 
instances of engagement and graduation. When the focus was specifically on implicit 
attitudes, Claude, particularly, often identified more attitudes than the human annotators. 
While a closer study of some of those attitudes showed that their identification was useful 
for the human annotators, it also pointed to potential over-analysis, especially since Claude, 
as it admitted, went beyond the theory of appraisal by including knowledge of pragmatics, 
sociolinguistics, and critical discourse analysis, which is reasonable and welcome. However, 
with appraisal being the focus of the study, this could be problematic since implicitness and 
evaluation in literature may encompass various elements. 
The findings suggest that while AI tools, such as ChatGPT, MS Copilot, and Claude can 
provide valuable insight into appraisal via rapid responses, they cannot entirely replace human 
annotators in capturing the complexity of the evaluative language in terms of accuracy in the 
identification of appraisals and implicit attitudes, along with consideration of context, which 
can be beneficial for appraisal research. This echoes the conclusions drawn by Hazemali et al. 
(2024), where the GPT-3.5 powered PDFGear Copilot exhibited competence in retrieving 
explicit information but struggled with in-depth interpretative tasks. Given the parallels 
between historical document analysis and appraisal research, it is evident that AI chatbots 
require human oversight to ensure accurate and contextually appropriate linguistic analysis. 
In other words, so far, sole reliance on AI tool chatbots for accurate and fine-grained analysis 
of appraisal is insufficient, and human assistance is indispensable. 
This study is based on a single text, so the results may not be completely generalizable across 
genres, datasets, and linguistic contexts. Additionally, the results may also be affected by 
previously published appraisal analyses, accessible to AI, by situational and cultural context, 
and the use of linguistic sources (co-text). For example, human and AI analyses of more 
factual texts which deploy mainly explicit attitudes and less authorial intervention might 
be more in sync than those that are rich in figurative language or allow for greater variety of 
interpretation. 
51LANGUAGE
Although the current study refrains from generalisation, owing to its exploration of a single 
text, it lays the foundation for future research that could further explore appraisal coding by 
utilizing a larger database, a variety of media outlets or other AI chatbots such as Gemini 
Perplexity, Qwen, and DeepSeek. Future research could also investigate any subjectivity and/
or (non)bias in implicit attitudes when these are identified by the AI tools, since human 
analysts should strive to adopt as neutral a reading position as possible, although, according 
to Martin and White (2005, 207), undesirable subjectivity cannot be entirely excluded from 
human appraisal analysis.
References
Crosthwaite, Peter, and Vit Baisa. 2023. “Generative AI and the end of corpus-assisted data-driven learning? 
Not so fast!” Applied Corpus Linguistics 3 (3): 100066. https://doi.org/10.1016/j.acorp.2023 .100066.
Curry, Niall, Paul Baker, and Gavin Brookes. 2024. “Generative AI for corpus approaches to discourse 
studies: A critical evaluation of ChatGPT.” Applied Corpus Linguistics 4 (1): 100082.  
https://doi.org /10.1016/j.acorp.2023.100082. 
deVore, Nicholas. 1949. “The urges and the emotions”. In New Frontiers of Psychology, by Nicholas deVore, 
42–53. Philosophical Library. https://doi.org/10.1037/13248-006.
Halliday, Michael A.K., and Christian M.I. Matthiessen. 2004. An Introduction to Functional Grammar. 
Hodder Arnold.
Hazemali, David, Janez Osojnik, Tomaž Onič, Tadej Todorović, and Mladen Borovič. 2024. “Evaluating 
chatbot assistance in historical document analysis.” Moderna arhivistika 7 (2): 53–83.  
https://doi.org /10.54356/ma/2024/biub3010. 
Hood, Susan. 2004. “Appraising research: Taking a stance in academic writing.” PhD diss., University of 
Technology, Sydney. http://www.grammatics.com/appraisal/suehoodphd/hood_title_page.pdf.
Hunston, Susan, and Geoff Thompson, eds. 1999. Evaluation in Text: Authorial Stance and the Construction 
of Discourse. Oxford University Press.
Imamović, Mirela, Silvana Deilen, Ekaterina Lapshinova-Koltunski, and Dylan Glynn. 2024. “Using 
ChatGPT for annotation of attitude within the appraisal theory: Lessons learned.” In Proceedings of 
the 18th Linguistic Annotation Workshop (LAW-XVIII), edited by Sophie Henning and Manfred Stede, 
112–23. Association for Computational Linguistics.
Koeva, Svetla. 2024. “Large language models in linguistic research: The Pilot and the Copilot.” In 
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024), 
319–28. Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian 
Academy of Sciences. https://aclanthology.org/2024.clib-1.35/.
Kress, Gunther. 1988. Communication and Culture: An Introduction. New South Wales University Press.
Križan, Agata. 2016. “The Language of appraisal in British advertisements: The construal of attitudinal 
judgement.” ELOPE: English Language Overseas Perspectives and Enquiries 13 (2): 199–220.  
https://doi .org/10.4312/elope.13.1.15-29. 
Leech, Geoffrey. 1993. “Corpus annotation schemes.” Literary and Linguistic Computing 8 (4): 275–81. 
Lozić, Edisa, and Benjamin Štular. 2023. “Fluent but not factual: A comparative analysis of ChatGPT and 
other AI chatbots’ proficiency and originality in scientific writing for humanities.” Future Internet 15 
(10): 336. https://doi.org/10.3390/fi15100336. 
Macken-Horarik, Mary. 2003. “appraisal and the special instructiveness of narrative.” Text - 
Interdisciplinary Journal for the Study of Discourse 23 (2): 285–312.  
https://doi.org/10.1515/text.2003.012. 
Martin, Jim R., and Peter Robert Rupert White. 2005. The Language of Evaluation: Appraisal in English. 
Palgrave Macmillan.
Orel Kos, Silvana. 2024. “Introduction of machine translation into audiovisual translation 
teaching.” ELOPE: English Language Overseas Perspectives and Enquiries 21 (1): 185–208.  
https://doi .org/10.4312/elope.21.1.185-208. 
52 Agata Križan, Aja Barbič Appraisal Analysis and AI Chatbots: Do We Even Need Humans?
Page, Ruth E. 2003. “An analysis of appraisal in childbirth narratives with special consideration of gender 
and storytelling style.” Text 23 (2): 211–37. https://doi.org/10.1515/text.2003.012.
Shahriar, Sakib, and Kadhim Hayawi. 2023. “Let’s have a chat! A conversation with ChatGPT: Technology, 
applications, and limitations.” Artificial Intelligence and Applications 2 (1): 11–20.  
https://doi.org/10 .47852/bonviewaia3202939. 
Thompson, Geoff. 2008. “Appraising glances: Evaluating Martin’s model of APPRAISAL.” Word 59 (1–2): 
169–87. https://doi.org/10.1080/00437956.2008.11432585. 
—. 2014. “AFFECT and emotion, target-value mismatches, and Russian dolls: Refining the APPRAISAL 
model.” In Evaluation in Context, edited by Geoff Thompson and Laura Alba-Juez, 47–66. John 
Benjamins.
Tica, Lena, and Ivana Krsmanović. 2024. “Overcoming the writer’s block? Exploring students’ motivation 
and perspectives on using ChatGPT as a writing assistance tool in ESP.” ELOPE: English Language 
Overseas Perspectives and Enquiries 21 (1): 129–49. https://doi.org/10.4312/elope.21.1.129-149. 
Uchida, Satoru. 2024. “Using early LLMs for corpus linguistics: Examining ChatGPT’s potential and 
limitations.” Applied Corpus Linguistics 4 (1): 100089. https://doi.org/10.1016/j.acorp.2024.100089. 
White, Peter Robert Rupert. 2000. Functional Grammar. Centre for English Language Studies, University 
of Birmingham.
Primary Source
O’Carroll, Lisa. 2024. “UK universities urge government to restart flow of EU students after Brexit.” The 
Guardian, September 30. https://www.theguardian.com/education/2024/sep/30/uk-universities-urge 
-government-to-restart-flow-of-eu-students-after-brexit
Academic Writing
Part III

55ACADEMIC WRITING
2025, Vol. 22 (1), 55-68(228)
journals.uni-lj.si/elope
https://doi.org/10.4312/elope.22.1.55-68
UDC: [811.111’243:378]:004.89
Silvana Neshkovska
University “St. Kliment Ohridski”, Bitola,  
North Macedonia
The Benefits and Risks of AI-Assisted Academic 
Writing: Insights from Current Research
ABSTRACT
This paper explores the transformative role of Artificial Intelligence (AI) tools, specifically 
ChatGPT, in the acquisition of English as a foreign language. With the rapid evolution of 
educational technology, AI-driven chatbots like ChatGPT offer innovative methodologies to 
augment language teaching and learning. This study examines the potential of ChatGPT to 
improve English language students’ writing abilities by providing suggestions, corrections and 
automated assistance. Through a review of existing literature and a discussion of the findings 
of recent studies, the paper seeks to highlight the benefits and risks of integrating AI tools 
into language education, especially, in the context of writing. Insights gained from multiple 
studies suggest that while ChatGPT has the potential to significantly enhance language 
students’ writing skills in all phases of writing, by promoting engagement, motivation, and 
autonomy among learners, it also necessitates cautious use to ensure academic integrity and 
to prevent over-reliance, which in turn, can stifle students’ learning capacities. 
Keywords: writing, EFL, AI, ChatGPT, benefits, risks
Prednosti in tveganja pri znanstvenem pisanju s pomočjo umetne 
inteligence: spoznanja iz aktualnih raziskav
IZVLEČEK
Prispevek raziskuje, kako so orodja umetne inteligence (UI), zlasti ChatGPT, preoblikovala 
učenje angleščine kot tujega jezika. S hitrim razvojem izobraževalne tehnologije pogovorni 
sistemi, kot je ChatGPT, ponujajo inovativne pristope za nadgradnjo poučevanja in učenja 
jezikov. Študija preučuje potencial orodja ChatGPT za izboljšanje pisnih spretnosti študentov 
in študentk angleščine s pomočjo predlogov, popravkov in samodejne pomoči. Na podlagi 
pregleda obstoječe literature in analize ugotovitev nedavnih raziskav prispevek osvetljuje 
prednosti in tveganja pri vključevanju orodij umetne inteligence v učenje jezikov, še posebej 
na področju pisanja. Ugotovitve številnih študij kažejo, da lahko ChatGPT bistveno izboljša 
pisne zmožnosti študentov in študentk v vseh fazah procesa pisanja, saj spodbuja njihovo 
vključenost, motivacijo in samostojnost. Kljub temu pa njegova uporaba zahteva premišljeno 
rabo, saj je treba zagotoviti spoštovanje akademske integritete in preprečiti pretirano zanašanje 
na tehnologijo, kar bi lahko zavrlo razvoj učnih sposobnosti.
Ključne besede: pisanje, angleščina kot tuji jezik, umetna inteligenca, ChatGPT, prednosti, 
tveganja
56 Silvana Neshkovska The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research
1 Introduction 
Writing is a fundamental language skill that foreign language learners must acquire, yet it 
remains one of the most challenging aspects of language acquisition. This is particularly true 
for academic writing, a critical skill that students must master at the university level (Yang 
2024; Özçelik and Ekşi 2024; Malá, Brůhová, and Vašků 2022).
With the advent of artificial intelligence (AI), foreign language acquisition is undergoing a 
profound transformation. There is an ample body of literature showing that teaching and 
learning practices are being fundamentally reshaped, and this shift extends to the domain 
of writing as well. AI, particularly through chatbots like ChatGPT, has introduced a new 
dimension to the development of writing skills. On the one hand, it offers significant 
opportunities for enhancing students’ writing proficiency; on the other hand, it presents risks 
and challenges that may disorient students and seriously undermine their academic growth 
and performance (Nguyen, Ngoc, and Dan 2024; Imran and Almusharraf 2023; Masoudi 
2024; Briggs 2018; Mun 2024, etc.). Many of the studies apart from emphasizing the 
advantages of AI in language learning (Jazbec 2024), also throw light on student perceptions 
of and experiences with the use of AI in this context (Mahapatra 2024; Rahmi et al. 2024; 
Artiana and Fakhrurriana 2024; Mun 2024; Özçelik and Ekşi 2024; Nguyen, Ngoc, and 
Dan 2024; Song and Song 2023; Tica and Krsmanović 2024; Khampusaen 2025, etc.).
Drawing on recent studies conducted in various parts of the world, this paper aims to 
highlight the practical implications of using a specific AI-driven tool, ChatGPT, in foreign 
language classrooms. More precisely, by reviewing existing literature on AI-assisted academic 
writing, this study explores the potential strategies for effectively utilizing ChatGPT in 
completing academic writing assignments. It examines how language students can leverage 
such technologies to enhance their writing skills, improve efficiency, and receive personalized 
support. At the same time, the study considers the risks and implications that the incorporation 
of such technologies might have on students’ academic well-being. Lastly, by reviewing the 
findings obtained from recent research, this study attempts to shed some light on student 
perceptions on the use of ChatGPT in academic writing.
2 Theoretical Background
2.1 Academic Writing in the Context of EFL 
Writing is often characterised as the most challenging of the four language skills for second-
language learners (Richards and Renandya 2002; Hyland 2003; Tica and Krsmanović 2024). 
This view is widely supported by researchers, teachers (Hyland 2003) and language students 
(Byrne 1993, in Tran 2024). Writing proficiency is often seen as a key factor for success in 
exams, recruitment tests, and general social standing (Dastgeer, Afzal, and Atta 2021, in 
Nguyen, Ngoc, and Dan 2024, 171). More specifically, writing serves as a crucial prerequisite 
not only in education but also in personal and professional endeavours (Yang 2024; Özçelik 
and Ekşi 2024) because it promotes communication, enhances thinking skills and encourages 
reflection among students (Klimova 2012, in Özçelik and Ekşi 2024). 
57ACADEMIC WRITING
However, viewed from another perspective, the complex cognitive processes that underlie writing 
make this process extremely challenging for foreign language learners. Students are required 
to produce, arrange, and transform their thoughts, opinions, attitudes, and feelings clearly 
and coherently in written form (Richards and Renandya 2002). According to Nunan (2003, 
88), writing is “the mental work of inventing ideas, thinking about how to express them, and 
organising them into sentences and paragraphs that will be clear to a reader.” Thus, proficient 
English writing abilities necessitate not only a comprehensive understanding of a language 
– an extensive lexicon, appropriate word selection, grammatical principles, punctuation and 
spelling rules, but also knowledge of layout conventions, sentence and paragraph organisation, 
and appropriate register and style use (Nguyen, Ngoc, and Dan 2024; Özçelik and Ekşi 2024; 
Sari and Agustina 2022). Similarly, Ferris (2018) emphasises that effective academic writing 
involves both an advanced grasp of linguistic aspects (e.g., vocabulary, spelling, grammar, 
cohesive devices, punctuation, capitalization, and formatting), and sufficient knowledge of 
extra-linguistic features (e.g., the content and the context of writing, the purpose of writing and 
the audience). According to Mun (2024), an additional factor that complicates matters further 
is the time limitation that normally accompanies academic writing assignments. Because of 
time constraints, students lose the motivation to fully invest themselves in the writing process, 
which, in turn, seriously hinders the development of their writing abilities.
Clearly, academic writing (irrespective of its format – essays, reports, studies, etc.) is not just a 
matter of linguistic competence; it requires broader socio-cultural and world knowledge. Taking 
all of this into consideration, it is unsurprising that many tertiary-level students find writing 
assignments daunting (Artiana and Fakhrurriana 2024; Khatter 2019; Rahmat et al. 2017). As 
Campbell (2019) (in Rahmi et al. 2024) rightfully points out, academic writing in English is a 
complex and integrative task, not only for international students but for native speakers, as well.
2.2 AI in Education, Foreign Language Acquisition and Writing
Recent years have seen a visible surge in AI-powered tools, which have left an indelible mark 
on several sectors, including education. These novel versatile tools can perform multiple 
functions, and, consequently, are seen as promising resources that can enhance student 
learning (Nazari et al. 2021, in Rahmi et al. 2024). Their capacity to exhibit human-like 
behaviour and cognitive abilities, including learning, self-correction, adaptation, reasoning, 
problem-solving, decision-making, and language comprehension, make them especially 
beneficial in educational environments (Shidiq 2023, in Artiana and Fakhrurriana 2024; 
Popenici and Kerr 2017, in Rahmi et al. 2024). 
Chatbots are a special type of AI-driven tool that is particularly advantageous in foreign 
language acquisition (Nguyen, Ngoc, and Dan 2024; Batanero et al. 2021 in Tran 2024). 
Researchers outline a long list of distinct benefits to using chatbots in language learning 
contexts: the creation of a relaxed learning environment; heightened student motivation; 
enhanced student enjoyment; reduced language anxiety; access to diverse learning resources; 
immediate and effective feedback on spelling and grammar; facilitation of reading and 
listening practice, and the provision of patient conversation partners (Fryer and Carpenter 
2006, 9–10). Also, these AI tools are credited with reinforcing students’ sense of autonomy 
and engagement (Yang 2024); their creative and critical thinking; problem-solving capabilities 
58 Silvana Neshkovska The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research
(Karataş et al. 2024; Kasneci et al. 2023), and for enlarging students’ vocabulary (Kohnke, 
Moorhouse, and Zou 2023). Writing skills have also been significantly impacted by the 
application of these technological advances in the foreign language classroom (Kasneci et al. 
2023). Purcell et al. (2013) (in Rahmi et al. 2024), in that respect, purport that the positive 
influence of these digital technologies on students’ writing production extends to both non-
native and native English users.
Among the AI-driven tools, ChatGPT holds the place of honour. Released in November 
20221, ChatGPT is a type of Large Language Model (LLM) that has changed the education 
scene immensely (Nguyen 2023). This text generation tool rapidly reached over 100 million 
users and attained a market-leading position (van Dis et al. 2023; Peachey 2023; Hu 2023; 
Dobrin 2023). Although ChatGPT is not the first nor the only AI-driven chatbot, still, what 
sets it apart from other chatbots is that it was pre-trained based on a vast corpus of human-
generated texts, because of which it is excellent at using natural language and generating highly 
human-like texts (Yang 2024; Anderson 2023, in Jen and Salam 2024). In fact, because of all 
the texts to which it was exposed during training, it generates immediate responses to text-
based instructions provided by the user (“prompts”) (Hellstrom 2024). Depending on the 
prompts it receives, it can provide answers to questions and can generate different kinds of 
text (Farina and Lavazza 2023, 2), ranging from social media posts, to emails, blog articles, 
and overviews of research studies; it can also produce summaries, inferences, comparisons, 
sentiment analysis, and translations to other languages (Hellstrom 2024, 2). It handles with 
ease follow-up questions, acknowledges mistakes, challenges incorrect assumptions, refuses 
inappropriate requests, and, most importantly, with ongoing human input, it continuously 
improves its performance (Masoudi 2024, 64). Research shows that this AI tool, through its 
advanced algorithms and natural language use, has significant potential to improve students’ 
writing ability by offering grammar corrections, suggestions, and comprehensive feedback 
(Osorio 2023, in Masoudi 2024, 65), i.e. by procuring ideas as well as final proofreading and 
editing of written material (Imran and Almusharraf 2023, 2). A crucial factor contributing to 
its widespread use in education is that today’s students, as digital natives, are accustomed to 
technology in their daily lives (Briggs 2018; Mun 2024), and they find the use of this tool to 
be uncomplicated and straightforward.
In the following sections, we will explore the benefits and risks of incorporating ChatGPT 
in academic writing as well as students’ perceptions related to this issue by discussing the 
findings and insights gained from several recent studies that have undertaken the exploration 
of this issue in diverse academic backgrounds.
3 Review of Recent Research
3.1 The Benefits and Risks of Incorporating ChatGPT in Academic Writing
Although some researchers claim that there is a serious lack of comprehensive empirical 
research confirming ChatGPT’s immense potential in augmenting language learners’ skills 
1 ChatGPT was initially released by OpenAI in 2018. The significant advances in the model, however, led to the release 
of the ChatGPT-3.5 model in November 2022, and the ChatGPT-4 model in March 2023.
59ACADEMIC WRITING
(Barrot 2023, in Mun 2024; Nguyen, Ngoc, and Dan 2024; Artiana and Fakhrurriana 2024; 
Yang 2024, Özçelik and Ekşi 2024; Su et al. 2023, in Mahapatra 2024), there is still no 
denying that the number of studies dealing with this issue and contributing to this discussion 
has been growing exponentially in recent years.
The findings of a vast pool of recent studies point to the fact that, if used appropriately, 
this large generative language model can immensely and genuinely improve students’ 
writing capabilities (Sawangwan 2024; Mun 2024; Khampusaen 2025). This AI-driven tool 
has been labelled a real game-changer in language education, primarily because it is very 
student-friendly and can provide more need-based or personalised assistance than similar 
tools (Rudolph, Tan, and Tan 2023, 350). More specifically, its real expertise in the context 
of writing lies in its ability to respond to user queries regarding various aspects of writing 
by offering suggestions, functioning as a support-on-demand tool, admitting mistakes and 
rectifying itself (Mahapatra 2024, 3). In essence, its main advantage is that it supports student 
writing by providing directions related to both the content and organisation of the writing 
assignment at all phases of writing (Chan and Hu 2023). 
In the pre-writing phase, ChatGPT alleviates the process of writing (Stokel-Walker 2022, in 
Mahapatra 2024, 3), primarily by generating ideas (Lingard 2023, in Mahapatra 2024, 3). In 
fact, ChatGPT serves as “an invaluable writing assistant which offers prompt responses and 
assists in brainstorming sessions” (Nguyen, Ngoc, and Dan 2024, 182) by generating new 
ideas for writing assignments, suggesting “topics, themes, and perspectives that they might 
not have considered otherwise” (Kasneci et al. 2023; Taecharungroj 2023, in Imran and 
Almusharraf 2023, 3), or by expanding upon users’ topics, presenting new aspects of their 
ideas, or providing contextually relevant suggestions (Bhatia 2023, in Nguyen, Ngoc, and 
Dan 2024, 182). All of these ‘interventions’ aid students “in overcoming their initial writer’s 
block, and in fostering their creativity, during the initial stages of writing” (Nguyen, Ngoc, 
and Dan 2024, 182). 
After the completion of the pre-writing stage, ChatGPT can be employed to provide 
corrective feedback (Dai et al. 2023, in Mahapatra 2024) on text organisation, especially 
on the logical organisation of content and thoughts, the addition of appropriate supporting 
details, the inclusion of suitable concluding remarks (Fitria 2023), the provision of logical 
connections between paragraphs (Nugroho, Putro, and Syamsi 2023), and the enhancement 
of writing mechanics (spelling errors, capitalization, or punctuation) (Zirar 2023). During 
the actual process of writing, ChatGPT’s corrective feedback can also target language use 
and grammar (Nguyen 2023) as well as vocabulary (Wang and Guo 2023). In other words, 
ChatGPT can provide access to grammar materials on various topics such as tenses, active 
and passive sentences, gerunds, infinitives, syntactic structure of sentences etc. It can also 
suggest appropriate vocabulary choices by providing synonyms and alternatives for words 
and phrases. This can be extremely helpful for non-native English speakers in their quest to 
express their ideas (Huang and Tan 2023, 1150–51). ChatGPT can work as “an alternative to 
dictionaries and model more advanced use of foreign learning” in the context of writing (Mun 
2024, 27). Furthermore, during the writing phase, this chatbot can also be used to ensure 
that students are using the appropriate style and tone for their specific writing assignment 
60 Silvana Neshkovska The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research
(Hellstrom 2024). Namely, ChatGPT can improve “the formality and clarity of their writing, 
ensuring a more accurate presentation of their ideas” (Nguyen, Ngoc, and Dan 2024, 184).
In the revision phase, language students can utilize ChatGPT for editing and proofreading 
purposes. While editing is mostly concerned with clarity and concision, and correcting 
wordiness of text, proofreading targets final polishing of verb constructions, punctuation, 
grammar, and spelling (Diamond and Allen 2024; Dobrin 2023).
In addition to these features – generation of ideas, assistance in content and structure 
organisation and language editing and proofreading, ChatGPT can help detect plagiarism 
by comparing a given text to existing published sources, thereby verifying its originality 
and determining whether it has been copied from other works (Huang and Tan 2023). 
Additionally, ChatGPT can provide “guidance on proper citation formats” and generate 
“reference entries for various citation styles” (Jarrah, Wardat, and Fidalgo 2023, in Nguyen, 
Ngoc, and Dan 2024, 184). 
The only prerequisite for obtaining adequate assistance from ChatGPT is for students to 
be trained in proper “prompt engineering”, which, basically, stands for putting precise 
and concise instructions into ChatGPT’s search box (Diamond and Allen 2024; Dobrin 
2023). Effective “prompt engineering” is vital at all stages of the writing process (Diamond 
and Allen 2024; Hellstrom 2024). Well-crafted prompts help in avoiding vague or generic 
responses, ensure accuracy, and prevent ChatGPT from generating offensive or misleading 
content. Diamond and Allen (2024), Dobrin (2023), and Skrabut (2023) call for continuous 
refinement of prompts based on the feedback received. To save time and to enhance the 
efficiency of all writing phases, students are advised to build a library of specialized prompts 
to which they constantly refer (Diamond and Allen 2024; Peachey 2023).
Given all the abovementioned insights from previous studies, it is safe to conclude that 
ChatGPT constitutes an invaluable tool capable of providing users with a solid foundation for 
their writing assignments. When employed effectively, it holds the potential to significantly 
enhance the academic writing experience of students, by offering both useful guidance 
and feedback (Raheem et al. 2023, in Nguyen, Ngoc, and Dan 2024, 179). Despite these 
considerable benefits, students must be consistently reminded that ChatGPT should serve as 
a supplemental tool – specifically, as a writing assistant – rather than a content creator that 
diminishes their role or, even worse, entirely replaces their input (Mun 2024; Barrot 2023; 
Tran Ngan, and Uyen 2025; Nguyen, Ti, and Hoa 2025). Put differently, students should 
embrace the idea that while machines can help construct good writing, humans are still the 
main actors controlling the flow in the writing process (Sumakul, Hamied, and Sukyadi 
2021).
Current research constantly draws attention to the plausible dangers that ChatGPT’s use can 
pose in the context of academic writing if it is not treated solely as an assistant. Thus, the 
most obvious negative ramifications of student overreliance on ChatGPT can be reflected in 
their ability to learn and develop their writing skills, since they could get used to obtaining 
ready-made texts (Mun 2024). The same goes for their ability to detect and correct their 
mistakes and to develop their creative and critical thinking skills (Kornfeld and Roy 2021, in 
61ACADEMIC WRITING
Tran 2024; Nguyen, Ti, and Hoa 2025). Chatbot’s limitations in interpretative and nuanced 
tasks have also been well-documented. For instance, Hazemali et al. (2024) demonstrated 
that chatbots often falter when tasked with complex contextual analyses, such as drawing 
cause-and-effect relationships in historical document reviews. This highlights the need for 
human oversight to ensure accuracy and depth in academic writing. These genuine threats 
to learners’ development of critical thinking and writing abilities have impelled a number 
of teachers and school administrators to perceive ChatGPT as the opening of Pandora’s box 
(Hong 2023, in Sawangwan 2024, 1). This, in turn, has culminated with some education 
institutes, in some countries announcing bans on the use of this chatbot altogether (Reuters 
2023, in Sawangwan 2024, 1). 
ChatGPT’s potential to threaten academic honesty and ethical conduct (Yan 2023, in 
Mahapatra 2024) can be observed in the fact that the factual content generated by ChatGPT 
is sometimes incorrect, and human control and intervention are required (Hellstrom 2024). 
In fact, ChatGPT, like the rest of GenAIs, is susceptible to responses that are known as 
‘hallucinations’, which, in essence, are false outputs despite appearing correct. These kinds of 
responses may occur because of a lack of sufficient information, vague or unclear prompts, 
limited or overly specific data within a language model, or biased datasets. As a result, they 
might contain incorrect citations, non-existent sources, or entirely fabricated information 
(Dobrin 2023). Hence, students are advised to always double-check ChatGPT-generated 
content for accuracy and relevance by consulting reliable resources (Dobrin 2023; Diamond 
and Allen 2024; Hellstrom 2024; Hazemali et al. 2024; Nguyen, Ti, and Hoa 2025).
Lastly, ChatGPT can encourage cheating and plagiarism in some students, especially, those 
who struggle with writing assignments (Jen and Salam 2024). In the most apocalyptic 
scenario, its continuous and nonselective use can lead to drastically reduced and changed need 
for, ability at, and valuation of human writing, or, in other words, can drastically decrease 
trust in the written word, as it would be difficult to prove whether a text was produced by a 
human being or a machine (Hellstrom 2024). 
3.2 Insights from Previous Studies Regarding Student Perceptions on the 
use of ChatGPT in Academic Writing
In this section, we discuss the findings of a selection of recent studies dealing with the role of 
ChatGPT in enhancing various aspects of language students’ writing skills as well as students’ 
perception of ChatGPT’s ‘interference’ with their writing.
Nguyen, Ngoc and Dan (2024) investigated Vietnamese students’ perceptions of ChatGPT’s 
usefulness by conducting a questionnaire and interviews, focusing on eight aspects of writing 
development: vocabulary, grammar, idea generation, organisation, translation, writing 
style, plagiarism management, and the mechanics of writing. Student responses revealed a 
moderately positive attitude towards ChatGPT’s use for writing purposes, with the highest 
ratings given to idea generation, and then to vocabulary, grammar, organisation, writing style 
and idea generation, and notably less pronounced interest in using ChatGPT for plagiarism 
management, translation, and the mechanics of writing. As to the limitations of using 
62 Silvana Neshkovska The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research
ChatGPT, students voiced concerns about its tendency to produce nonspecific or irrelevant 
responses, the risk of over-reliance on the tool, and its inability to provide reliable references. 
Based on these findings, Nguyen, Ngoc and Dan (2024) concluded that ChatGPT both 
streamlines the writing process by allowing students to upgrade their argumentative writing 
skills at a fast pace and promotes a more engaging and dynamic approach to language 
acquisition and composition in general.
Similarly, Song and Song (2023) assessed the influence of ChatGPT on the writing abilities 
and motivation of Chinese EFL students. Using a pre-test and post-test design, they compared 
the writing skills of 50 students, who were randomly assigned to control and experimental 
groups. In addition to the tests, semi-structured interviews explored the students’ motivation 
for and experiences with AI-assisted learning. The results indicated that ChatGPT helped 
improve vocabulary, grammar, organization, and idea generation in the experimental group 
in comparison to those receiving traditional instruction. Students also expressed concerns 
about AI’s accuracy in certain contexts and the dangers of becoming overly dependent on it.
Yang’s (2024) empirical study explored the impact of ChatGPT on writing proficiency 
among Chinese EFL learners. Using a qualitative case study approach, the study included 
Chinese undergraduate students who participated in semi-structured interviews intended 
to provide in-depth insight into their experiences with ChatGPT. This study investigated 
ChatGPT’s impact on students’ writing proficiency, focusing on the planning and revision 
phase of the writing process, and showed that ChatGPT “aids in planning by helping students 
think deeply, generate ideas, and organize them coherently” (Yang 2024, 176). Furthermore, 
the study highlights that “during revision, it provides feedback on grammar, spelling, and 
structure, refining expressions and producing polished writing” and that “students reported 
enhanced creative thinking and improved essay coherence and readability” (Yang 2024, 
176). Given these results, Yang’s (2024) conclusion is that integrating ChatGPT into writing 
instruction can effectively enhance students’ writing outcome.
ChatGPT’s impact on the acquisition of register knowledge across various writing tasks 
among undergraduate students in Turkey was explored by Özçelik and Ekşi (2024). The 
students were asked to complete writing assignments, which were then checked by ChatGPT 
for corrections and suggestions. The researchers trained students in prompt engineering to 
help them achieve better results from ChatGPT. The study found that ChatGPT helped 
students to overcome their initial reluctance to engage in writing tasks. It was particularly 
useful for acquiring formal register knowledge but less effective for teaching neutral register 
or informal writing.
In Mahapatra’s (2024) study, ChatGPT was examined as a feedback tool for the academic 
writing skills of undergraduate ESL students in a large Indian university classroom. His 
mixed-methods intervention involved pre-tests, post-tests, and delayed tests, and Mahapatra 
established that the employment of ChatGPT as a feedback tool had a substantially positive 
impact on students’ academic writing proficiency. The students expressed overwhelmingly 
favourable opinions about the tool, upon which Mahapatra (2024) concluded that ChatGPT 
can serve as a dependable feedback tool for academic writing assignments.
63ACADEMIC WRITING
Mun (2024), on the other hand, conducted a study among Korean EFL college students 
to understand how they used ChatGPT in essay writing and what their perceptions of its 
usefulness were. The students were organised into an experimental group and a control 
group. They were given instructions by the same instructor, used the same course materials 
and syllabus, and underwent the same examinations. The participants had a pre-test and 
a post-test, during which they wrote an essay expressing their viewpoints on a selected 
topic. The participants in the experimental group received instructions for writing adequate 
prompts and were told to use ChatGPT to individually proofread and revise their drafts. They 
submitted their second drafts after they had received feedback from ChatGPT, whereas the 
students from the control group submitted their drafts after receiving peer feedback in class. 
The findings of this study revealed a highly positive sentiment towards ChatGPT overall, 
with students perceiving it as a valuable and effective tool for English writing and language 
learning. They particularly pointed out its ease of use, convenience, and positive impact 
on grammar, vocabulary, and content organisation. Furthermore, these results indicated 
significantly improved writing performance among the experimental group of students 
compared to the control group. More precisely, according to Mun (2024, 36), the students in 
the experimental group exhibited “enhanced post-test writing quality in both structural and 
linguistic aspects, which surpassed considerably their pre-test scores”. 
The perspectives of Indonesian EFL undergraduate students on using ChatGPT in academic 
writing were explored by Artiana and Fakhrurriana (2024) through a study that included a 
qualitative approach. This study involved participants who used ChatGPT in their writing 
assignments, and the data was collected through observation, in-depth interviews and an 
analysis of academic writing tasks produced by the students. The researchers endeavoured 
to assess the writing quality, language use, and developmental progress in academic writing 
among students using ChatGPT as a writing aid. The study revealed that ChatGPT accelerated 
the writing process, alleviated pressure, and helped students produce more fluent and better 
structured texts. Students appreciated its assistance with idea organisation and argument 
construction, as well as its ability to offer alternative suggestions and phrasing options.
The integration of ChatGPT into the English language writing curriculum in Thai EFL 
universities was investigated by Sawangwan (2024). This study found that ChatGPT 
contributed to making significant improvements in students’ proficiency, which moved from 
the B1 level to C1, according to the CEFR. Sawangwan (2024) in this study also emphasized 
the evolving role of teachers as facilitators who guide students in the use of AI tools, by 
providing technical support, establishing writing criteria, and offering ethical guidance. This 
shift in the role of teachers from “being completely in charge” to “being mere facilitators,” 
allows them to focus more on curriculum development and personalized support, ultimately 
enhancing students’ writing performance (Sawangwan 2024, 14).
Rahmi et al. (2024) reported in their study that while Indonesian students generally viewed 
AI tools like ChatGPT quite favourably, they did note some drawbacks, including the tool’s 
lack of intentionality and its failure to replicate the nuances of human thought. Students felt 
that AI-generated text often lacked a “human touch” and could produce content that was 
predictable, stylistically inconsistent, or irrelevant to the topic.
64 Silvana Neshkovska The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research
Another study that underlines serious drawbacks that stem from using AI tools in the 
context of academic writing is Tran, Ngan and Uyen’s (2025). This study focused on a 
group of postgraduate students majoring in English in Vietnam and their experiences with 
AI. Interestingly, these students, in addition to the benefits, which mostly take the form of 
improved writing skills and immediate support, also underlined serious drawbacks such as 
experiencing difficulty logging in and signing up for accounts when using AI tools; costly 
subscriptions and unstable Internet connection; the real danger of becoming overly reliant 
on AI-generated content, losing one’s thinking and writing skills, and, finally, the challenge 
of integrating AI-generated texts into one’s own writing while preserving one’s academic voice 
(Tran, Ngan, and Uyen 2025, 87).
Although this section does not provide a comprehensive overview of all current studies on 
AI-assisted writing, the available findings indicate that EFL students from diverse academic 
backgrounds around the world generally express positive attitudes towards the integration of 
AI tools – particularly ChatGPT – into their academic writing processes. The benefits that are 
stressed throughout the studies, generally encompass grammar, vocabulary, idea generation, 
immediate and personalized feedback, register, motivation, proofreading, and editing. A 
common feature of the analysed studies is their reliance on similar research methodologies, 
which typically include interviews, questionnaires, analyses of students’ writing assignments, 
and pre- and post-tests. Moreover, most of these studies capture students’ perceptions over 
a short period and do not engage in longitudinal research that would track the evolution of 
students’ experiences and attitudes toward the use of AI tools in academic writing contexts. 
While the primary focus of the reviewed studies is on the benefits related to the content and 
structure of student writing, many also address notable drawbacks such as the potential for 
over-reliance on AI, the production of vague or irrelevant responses, and the inability of AI 
to replicate the nuances of human thought. Nonetheless, the consensus across the studies is 
that the benefits outweigh the risks, and that the topic warrants further scholarly attention.
4 Conclusion 
On the basis of the discussion above, it can be inferred that researchers have paid considerable 
attention to the application of ChatGPT in academic writing, despite the relative novelty of 
this AI tool. Given the complexity and high relevance of writing as one of the main language 
skills, this focus is unsurprising. 
The review of recent literature reveals that ChatGPT indeed holds significant promise as a 
tool for enhancing academic writing, particularly in the context of English language learning. 
Studies disclose that, when used effectively and ethically (with proper student training), 
ChatGPT has many benefits. It can support students in various stages of the writing process, 
from idea generation to revision, providing guidance on content, structure, grammar, and 
vocabulary, all while improving motivation. The advantages to using it include its role in 
facilitating brainstorming, improving writing mechanics, and providing corrective feedback. 
These advantages apply to both non-native and native speakers of English.
However, the integration of ChatGPT into academic writing is not without risks. Recent 
studies highlight that overreliance on the tool may hinder the development of students’ 
65ACADEMIC WRITING
critical thinking, creativity, and self-editing skills. Additionally, there is a potential for 
academic dishonesty, as students might use it as a shortcut to complete writing assignments 
or to bypass the writing process entirely. The tool’s limitations in the form of occasional 
inaccuracies and “hallucinations” emphasize the need for students to exercise caution and 
verify the information generated by ChatGPT. 
Regarding students’ perspectives, latest studies show that, in general, English language 
students from a range of academic backgrounds, embrace this tool in their language 
acquisition process. They report a positive impact on their writing proficiency, particularly 
in the planning, drafting and revision phases. It is of paramount importance to mention that 
students also display acute awareness of the downsides of using ChatGPT. In that context, 
they particularly underline its lack of nuanced, human-like language, occasional stylistic 
inconsistencies, shortcomings in the use of informal and neutral register, and difficulties 
logging in and signing in.   
Ultimately, the findings and insights gained from these studies show that while ChatGPT 
offers substantial support, it should be viewed as a supplemental tool, not as a replacement for 
the students’ own effort and intellectual engagement. Universities and language instructors 
must guide students in using AI tools responsibly, ensuring that these complement rather 
than replace student learning and development in academic writing. Thus, for instance, in 
the pre-writing phase, students should be encouraged to do the brainstorming independently 
first, and then ask AI tools to generate ideas for them. Also, in the writing and revision phase, 
students should be instructed to be persistent in verifying the truthfulness and reliability of 
AI-generated content. 
A major recommendation for future studies is to include longitudinal research that examines 
potential changes in students’ experiences with and attitudes toward the use of ChatGPT in 
writing contexts. Additionally, future research could address unresolved questions, such as 
how educators can train students to use ChatGPT ethically and whether universities should 
implement specific regulations to address the ethical challenges associated with using AI in 
writing assignments.
References
Artiana, Nisa, and Ria Fakhrurriana. 2024. “EFL undergraduate students’ perspective on using AI-based 
ChatGPT in academic writing.” Language and Education Journal 9 (1): 1–11.
Barrot, Jessie. 2023. “Using ChatGPT for second language writing: Pitfalls and potentials.” Assessment in 
Writing 57 (2): 100745. https://doi.org/10.1016/j.asw.2023.100745.
Briggs, Nell. 2018. “Neural machine translation tools in the language learning classroom: Students’ use, 
perceptions, and analyses.” The JALT CALL Journal 14 (1): 2–24. https://doi.org/10.29140/jaltcall.v1 
4n1.221.
Chan, Cecilia Ka Yuk, and Wenjie Hu. 2023. “Students’ voices on generative AI: Perceptions, benefits, and 
challenges in higher education.” International Journal of Educational Technology in Higher Education 20 
(43): 1–18. https://doi.org/10.1186/s41239-023-00411-8.
Diamond, Stephanie, and Jeffrey Allan. 2024. Writing AI Prompts for Dummies. John Wiley & Sons.
Dobrin, Sidney. 2023. AI and Writing. Broadview Press.
Farina, Mirko, and Andrea Lavazza. 2023. “ChatGPT in society: Emerging issues.” Frontiers in Artificial 
Intelligence 6: 1130913. https://doi.org/10.3389/frai.2023.1130913.
66 Silvana Neshkovska The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research
Ferris, R. Dana. 2018. “Writing in a second language.” In Teaching English to Second Language Learners in 
Academic Context: Reading, Writing, Listening, and Speaking, edited by Jonathan M. Newton, Dana R. 
Ferris, Christine C. M. Goh, William Grabe, Frederika L. Stoller, and Larry Vandergriff, 75–122. 
Routledge. https://doi.org/10.4324/9781315626949-7.
Fitria, Tira Nur. 2023. “Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of 
ChatGPT in writing English essay.” ELT Forum: Journal of English Language Teaching 12 (1): 44–58.
Fryer, Luke, and Rollo Carpenter. 2006. “Bots as language learning tools.” Language Learning & Technology 
10 (3): 8–14.
Hazemali, David, Janez Osojnik, Tomaž Onič, Tadej Todorović, and Mladen Borovič. 2024. “Evaluating 
chatbot assistance in historical document analysis.” Moderna arhivistika 7 (2): 53–83. https://doi.org 
/10.54356/ma/2024/biub3010. 
Hellstrom, Thomas. 2024. “AI and its consequences for the written word.” Frontiers in Artificial Intelligence 
6: 1326166. https://doi.org/10.3389/frai.2023.1326166.
Hu, Krystal. 2023. ChatGPT Sets Record for Fastest-Growing User Base – Analyst Note. Reuters. https://www 
.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/.
Huang, Jingshan, and Ming Tan. 2023. “The role of ChatGPT in scientific communication: Writing better 
scientific review articles.” American Journal of Cancer Research 13 (4): 1148–54.
Hyland, Ken. 2003. Second Language Writing. Cambridge University Press. https://doi.org/10.1017/CBO 
9780511667251.
Imran, Muhammad, and Norah Almusharraf. 2023. “Analyzing the role of ChatGPT as a writing assistant 
at higher education level: A systematic review of the literature.” Contemporary Educational Technology 
15 (4): ep464. https://doi.org/10.30935/cedtech/13605.
Jazbec, Saša. 2024. “Umetna inteligenca oziroma orodja, podprta z umetno inteligenco, pri pouku in za 
pouk tujih jezikov: empirična raziskava o stališčih učiteljev tujega jezika v Sloveniji.” Ars & Humanitas 
18 (1): 115–30. https://doi.org/10.4312/ars.18.1.115-130.  
Jen, Ling Shirley, and Abdul Rahim Salam. 2024. “Using artificial intelligence for essay writing.” Arab 
World English Journal (AWEJ) (April): 90–99. https://doi.org/10.24093/awej/ChatGPT.5.
Karataş, Fatih, Abedi Yaşar Faramarz, Filiz Ozek Gunyel, Derya Karadeniz, and Yasemin Kuzgun. 2024. 
“Incorporating AI in foreign language education: An investigation into ChatGPT’s effect on foreign 
language learners.” Education and Information Technologies 29 (15): 19343–66. https://doi.org/10.10 
07/s10639-024-12574-6.
Kasneci, Enkelejda, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank 
Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, Stephan Krusche, Gitta 
Kutyniok, Tilman Michaeli, Claudia Nerdel, Jürgen Pfeffer, Oleksandra Poquet, Michael Sailer, 
Albrecht Schmidt, Tina Seidel, Matthias Stadler, Jochen Weller, Jochen Kuhn, and Gjergji Kasneci. 
2023. “ChatGPT for good? On opportunities and challenges of large language models for education.” 
Learning and Individual Differences 103: 102274. https://doi.org/10.1016/j.lindif.2023.102274.
Khampusaen, Dararat. 2025. “The impact of ChatGPT on academic writing skills and knowledge: An 
investigation of its use in argumentative essays.” LEARN Journal: Language Education and Acquisition 
Research Network 18 (1): 963–88. https://doi.org/10.70730/PGCQ9242.
Khatter, Sanaa. 2019. “An analysis of the most common essay writing errors among EFL Saudi female 
learners.” Arab World English Journal 10 (3): 364–81. https://doi.org/10.24093/awej/vol10no3.26.
Kohnke, Lucas, Benjamin Luke Moorhouse, and Di Zou. 2023a. “ChatGPT for language teaching and 
learning.” RELC Journal 54 (2): 537–50. https://doi.org/10.1177/00336882231162868. 
Mahapatra, Santosh. 2024. “Impact of ChatGPT on ESL students’ academic writing skills: A mixed 
methods intervention study.” Smart Learning Environments 11: 9. https://doi.org/10.1186/s40561 
-024-00295-9.
Malá, Marketa, Gabriela Brůhová, and Katerina Vašků. 2022. “Reporting verbs in L1 and L2 English 
novice academic writing.” ELOPE: English Language Overseas Perspectives and Enquiries 19 (2): 127–
47. https://doi.org/10.4312/elope.19.2.127-147.
67ACADEMIC WRITING
Masoudi, Hatim. 2024. “Effectiveness of ChatGPT in improving English writing proficiency among non-
native English speakers.” International Journal of Educational Sciences and Arts 3 (4): 62–84.  
https:// doi.org/10.59992/IJESA. 2023.v3n4p2. 
Mun, Chae-young. 2024. “EFL learners’ English writing feedback and their perception of using 
ChatGPT.” Journal of English Teaching Through Movies and Media 25 (2): 26–39. https://doi.org/10 
.16875/stem.2024.25.2.26.
Nguyen, Ho Huynh Bao, Ho Huynh Bao Ngoc, and Thai Cong Dan. 2024. “EFL students’ perceptions 
and practices of using ChatGPT for developing English argumentative essay writing skills.” European 
Journal of Alternative Education Studies 9 (1): 168–216. https://doi.org/10.46827/ejae.v9i1.5341.
Nguyen, Thi Thu Hang. 2023. “EFL teachers’ perspectives toward the Use of ChatGPT in writing Classes: 
A case study at Van Lang University.” International Journal of Language Instruction 2 (3): 1–47.  
https:// doi.org/10.54855/ijli.23231.
Nguyen, Thi Yen Phuong, Nguyen Ngoc Ti, and Phan Nguyen Khanh Hoa. 2025. “The challenges of 
applying ChatGPT in the academic writing of postgraduate students in English major at IUH.” 
International Journal of AI in Language Education 2 (1): 20–37. https://doi.org/10.54855/ijaile.25212.
Nugroho, Arif, Nur Hidayanto Pancoro Setyo Putro, and Kastam Syamsi. 2023. “The potentials of 
ChatGPT for language learning: Unpacking its benefits and limitations.” Register Journal 16 (2): 224–
47. https://doi.org/10.18326/register.v16i2.224-247.
Nunan, David. 2003. Practical English Language Teaching. McGraw Hill Education. 
Özçeli̇k, Punar Nermin, and Yangin Gonca Ekşi. 2024. “Cultivating writing skills: The role of ChatGPT as 
a learning assistant – a case study.” Smart Learning Environments 11, 10.  
https://doi.org/10.1186/s40561-024-00296-8.
Peachey, Nick. 2023. ChatGPT in the Language Classroom. Peachey Publications.
Rahmat, Noor Hanim, Mazlen Arepin, D Rohayu Mohd Yunos, and Sharifah Amani Syed Abdul 
Rahman. 2017. “Analyzing perceived writing difficulties through the social cognitive theory.” 
PEOPLE: International Journal of Social Sciences 3 (2): 1447–99.  
https://doi.org/10.20319/pijss.2017.32.14871499.
Rahmi, Regina, Zahria Amalina, Andriansyah Andriansyah, and Adrain Rodgers. 2024. “Does it really 
help? Exploring the impact of Al-generated writing assistant on the students’ English writing.” Studies 
in English Language and Education 11 (2): 998–1012. https://doi.org/10.24815/siele.v11i2.35875.
Richards, C. Jack, and Willy A. Renandya. 2002. Methodology in Language Teaching: An Anthology of 
Current Practice. Cambridge University Press. https://doi.org/10.1017/CBO9780511667190.
Rudolph, Jurgen, Samson Tan, and Shannon Tan. 2023. “ChatGPT: Bullshit spewer or the end of 
traditional assessments in higher education?” Journal of Applied Learning & Teaching 6 (1): 342–
63. https://doi.org/10.37074/jalt.2023.6.1.9.
Sari, Eka Dyah Puspita Sari, and Mia Fitria Agustina. 2022. “Thematic development in students’ 
argumentative essay.” IDEAS: Journal on English Language Teaching and Learning, Linguistics and 
Literature 10 (1): 166–74.
Sawangwan, Sirin. “ChatGPT vs teacher roles in developing EFL writing.” International Journal of 
Computer-Assisted Language Learning and Teaching (IJCALLT) 14 (1): 1–21.  
https://doi.org/10.4018/IJCALLT.361235.
Skrabut, Stan. 2023. 80 Ways to Use ChatGPT in the Classroom. Using AI to Enhance Teaching and Learning. 
Stan Skrabut.
Song, Cuiping, and Yanping Song. 2023. “Enhancing academic writing skills and motivation: Assessing 
the efficacy of ChatGPT in AI-assisted language learning for EFL students.” Frontiers in Psychology 14: 
1–14. https://doi.org/10.3389/fpsyg.2023.1260843.
Sumakul, Dian Toar. Y. G., Fuad Abdul Hamied, and Didi Sukyadi. 2021. “Students’ perceptions of the 
use of AI in a writing class.” Advances in Social Science, Education and Humanities Research 624: 52–
57. https://doi.org/10.2991/assehr.k.220201.009.
Tica, Lena, and Ivana Krsmanović. 2024. “Overcoming the writer’s block? Exploring students’ motivation 
and perspectives on using ChatGPT as a writing assistance tool in ESP.” ELOPE: English Language 
Overseas Perspectives and Enquiries 21 (1): 129–49. https://doi.org/10.4312/elope.21.1.129-149. 
68 Silvana Neshkovska The Benefits and Risks of AI-Assisted Academic Writing: Insights from Current Research
Tran, Hong Ngoc, Le Thi Thuy Ngan, and Tran Vu Bich Uyen. 2025. “AI tools in learning academic 
writing: Benefits and challenges for MA students in the English language studies at the Industrial 
University of Ho Chi Minh City.” International Journal of AI in Language Education 2 (1): 74–91. 
https://doi.org/10.54855/ijaile.25215.
Tran, Thi Thu Hien. 2024. “AI Tools in teaching and learning English academic writing skills.” Proceedings 
of the AsiaCALL International Conference 4, 170–87. https://doi.org/10.54855/paic.23413.
van Dis, Eva A. M., Johan Bollen, Willem Zuidema, Robert van Rooij, and Claudi L. Bockting. 2023. 
“ChatGPT: Five priorities for research.” Nature 614 (7947): 224–26. https://
doi.org/10.1038/d41586-023-00288-7.
Wang, Mengqian, and Wenge Guo.  2023. “The potential impact of ChatGPT on education: Using 
history as a rearview mirror.” ECNU Review of Education 1 (8). https://
doi.org/10.1177/20965311231189826.
Yang, Yang. 2024. “An empirical study on the impact of ChatGPT on writing proficiency in Chinese EFL 
learners.” Curriculum and Teaching Methodology 7 (4). https://doi.org/10.23977/curtm.2024.070425. 
Zirar, Araz. 2023. “Exploring the impact of language models, such as ChatGPT, on student learning and 
assessment.” Review of Education 11 (3): e3433. https://doi.org/10.1002/rev3.3433.
69ACADEMIC WRITING
2025, Vol. 22 (1), 69-91(228)
journals.uni-lj.si/elope
https://doi.org/10.4312/elope.22.1.69-91
UDC: [811.111’243:378]:004.912
Rashmika Lekamge
Sabaragamuwa University of Sri Lanka, Sri Lanka
Clayton Smith 
University of Windsor, Canada
Impact of Auto-Correction Features in Text-Processing 
Software on the Academic Writing of ESL Learners
ABSTRACT
The intrusion of technology into language education is undeniable. However, its impact on 
English as a Second Language (ESL) learners remains unexplored. This study explores how 
the text-processing and suggestion features of Microsoft Word affect the English language 
development of ESL learners. The writing samples show that while beginners make fewer 
spelling and punctuation errors, prolonged reliance on software weakens long-term language 
proficiency. This finding is supported by cluster analysis of first-year undergraduates, third-
year undergraduates, and postgraduates. Conversely, first-year undergraduates learners excel 
in structuring paragraphs and writing a variety of sentences, which are the areas untouched 
by automation offered in the tested software. Semi-structured interviews with research-active 
academics and postgraduate students further validated these findings, highlighting a critical 
decline in writing confidence due to over-dependence on emerging technology. The study 
underscores the hidden costs of convenience, urging a recalibration of technology-integrated 
language pedagogy.
Keywords: automated writing correction, ESL development, technology dependence, 
writing proficiency decline, text-processing software
Vpliv funkcije samodejnega popravljanja v programih za 
urejanje besedil na akademsko pisanje učencev in učenk 
angleščine kot drugega tujega jezika
IZVLEČEK
Vdor tehnologije v učenje jezikov je nesporen, a je njen vpliv na angleščino kot drugi tuji jezik 
premalo raziskan. Študija raziskuje, kako funkcije samodejnega popravljanja in predlogov v 
programu Microsoft Word vplivajo na razvoj znanja angleščine pri študentih in študentkah 
angleščine kot drugega tujega jezika. Pisni vzorci so pokazali, da začetniki naredijo manj napak 
pri črkovanju in ločilih, a dolgotrajna odvisnost od programske opreme oslabi dolgoročno 
jezikovno znanje. To potrjuje analiza prvega letnika, tretjega letnika in podiplomskih 
študentov in študentk. Prvi letnik se je sicer izkazal pri strukturiranju odstavkov in variiranju 
povedi, ki ju programska oprema ne avtomatizira. Polstrukturirani intervjuji z raziskovalci in 
raziskovalkami in podiplomskimi študenti in študentkami so ugotovitve potrdili ter izpostavili 
občuten upad samozavesti pri pisanju zaradi pretirane odvisnosti od sodobne tehnologije. 
Raziskava izpostavi skriti davek udobja in nujnost ponovnega uravnoteženja pedagoških 
pristopov pri vključevanju tehnologije v jezikovni pouk.
Ključne besede: samodejno popravljanje pri pisanju, razvoj ESL, tehnološka odvisnost, upad 
pisne zmožnosti, programska oprema za obdelavo besedil
70 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features  in Text-Processing Software on the Academic ...
1 Introduction
The advancement of social and technological development has introduced auto-correction 
features integrated into text processors, continuously reshaping the terrain of the language 
competence of English as a second language (ESL) users. Auto-correction refers to digital 
spelling and grammar correction tools embedded in word-processing programs, which 
automatically detect errors and either suggest corrections or directly rectify them (Wood 
2014). These features were first implemented in the 1980s as a strategy to boost the demand 
for computers (Cummings 2023; Kruse and Rapp 2023; Larsson and Teigland 2020; Steyn 
and Johanson 2011). According to the predictions of that era, spell-checker programs were 
predicted to become a mandatory feature in future text-processing programs. The service 
provides users with correct spelling or grammar before they can even recognize their mistakes; 
later, however, it emerged that these programs negatively influenced the writing and language 
performance of its users (Baron 2023; Omer Ismael et al. 2022; Rüdian, Dittmeyer, and 
Pinkwart 2022). 
Scholars have identified that the auto-correction feature in word-processing programs 
operates through three primary mechanisms: 
(a) direct corrections that provide the corrected form of error on the spot, 
(b) indirect corrections that direct users’ attention to the errors but leave users to select 
the correct option, and 
(c) metalinguistic corrections where the program identifies the errors, labels the errors 
based on their nature and provides a brief explanation, with or without relevant 
examples (Barrot 2023). 
However, technology and language experts hold conflicting views regarding the impact of 
these features on the language competences of English learners beyond the inner circle of 
Kachru’s (1985) ‘Three Concentric Circles’ model. Accordingly, the inner circle represents 
countries where English is the primary language (e.g., the UK, the USA), the outer circle 
includes ESL contexts where English serves as an institutionalized additional language (e.g., 
India, Nigeria), and the expanding circle comprises EFL (English as a Foreign Language) 
contexts where English is learned as an international language but lacks official status (e.g., 
China, Saudi Arabia). The negative effects of auto-correction may be more pronounced in 
outer and expanding circle countries, where learners often rely on normative standards from 
inner-circle varieties, potentially influencing their linguistic development in ESL (norm-
developing) and EFL (norm-dependent) contexts (A. Al-Mutairi 2019; Hu and Jiang 2011). 
One critical statement claims that this technology has ‘created a generation of dummies’ 
(Wood 2014, 12). On the contrary, Weigle argues (2013) that providing corrective feedback 
is essential as it addresses deficiencies in students’ linguistic repertoire, particularly in advanced 
writing, which can be corrected easily. Furthermore, it highlights quicker and potentially 
more effective methods for improving academic writing through integrated auto-correction 
features in text-processing tools. 
Studies have validated the advantage of the auto-correction features in text-processing 
applications, as it enables users to compose relatively error-free text (Neto, Bezerra, and Toselli 
2020; Putze et al. 2017). Moreover, it determines which words are most likely to have been 
Rashmika Lekamge, Clayton Smith
Impact of Auto-
Correction Features  
in Text-Processing 
Software on the 
Academic ...
71ACADEMIC WRITING
intended and then fixes the text accordingly, which enhances accuracy in technical aspects 
of writing, improving readability and consistency throughout the document, additionally 
reducing the cognitive load of writing, allowing greater focus on content and ideas (Sanchez 
et al. 2023). 
Continuous reliance on these features has intensified challenges for ESL and EFL learners, 
particularly in spelling, note-taking, and instant essay writing (Omer Ismael et al. 2022; 
Sanchez et al. 2023). It affects student writing abilities because users often fail to notice their 
mistakes on account of automatic correction (Kontogiannis 1999). The issue causes users 
to not fully internalize spelling and grammar rules (Ajaj 2022), leading to weaker writing 
skills over time. This is detrimental for students from an ESL context since they lack English 
language input from the broader society/environment (Saud et al. 2023) like the inner circle 
countries (Hu and Jiang 2011). It is thus vital to explore the impact caused on ESL learners’ 
language competency by the auto-correction feature in text-processing applications. The 
present study focuses on the effects of auto-correction technology on the English writing 
skills of ESL undergraduates. Insights from the on-site test experience and interviews with 
research-active academics and postgraduate participants will ensure the quality and validity 
of the findings.
The findings of this study will assist in identifying the long-term impact of text processors with 
auto-correction and serve as an initial step towards potential technological developments that 
could overcome these drawbacks. The following section investigates the existing literature 
relevant to our study. 
2 Literature Review: Empirical Studies on Auto-Correction
While text-processing applications with automatic corrective feedback enhance language 
learning, their use in ESL contexts has shown harmful effects. Kim (2012) stated that 
error correction is ineffective and harmful in physically interactive learning environments. 
Similarly, studies have indicated that it is ineffective to expect language development through 
automatic correction since the student/ learner is unaware of the error he/she has committed 
(Truscott 1999). Despite concerns about grammar or spelling, students often fail to recognize 
their errors and misunderstandings. On the contrary, Rüdian et al. (2022) claim that auto-
correction is a promising tool for ESL language users, as it helps to minimize the gap between 
teacher expectations and learner skills, and over 66% of the errors identified by educators 
were not detected by the auto-correction software (designed for German language). The study 
also found that the software frequently flagged correct items as errors, causing confusion 
and undermining its reliability. These findings highlight the need for auto-correction tools 
to move beyond basic proofreading toward a more comprehensive approach to language 
learning (Alharbi 2023).
Another study found that automated feedback systems improve writing quality and outcomes, 
but it unveiled shortcomings in ESL/EFL contexts (Benali 2021). Further, Ferris and Roberts 
(2001) explored effective methods for providing error correction feedback, but their study 
highlighted the scarcity of existing literature on the impact of auto-correction on English 
language learners in ESL contexts. 
72 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features  in Text-Processing Software on the Academic ...
Another significant study examined the impact of automatic spelling correction, focusing on 
learners’ awareness of how the software functions, corrects spelling mistakes, its educational 
value for learning English spelling, and learners’ dependence on the tool for checking their 
spelling (Lin, Liu, and Paas 2017). The study found that male learners were generally more 
competent and benefited more from the software, with the sample showing an overall 
positive preference for the auto-correction feature  (Rahimi, Gholizadeh, and Shahryari 
2019). Conversely, the study of Ali et al., (2022) found that learners who relied heavily on 
technology for spelling correction faced greater challenges in maintaining spelling accuracy 
during writing tasks. Supporting this argument, Wood (2014)  highlighted, a survey revealed 
that students who heavily depended on smart devices performed worse in spelling than those 
who had less interaction with such devices and spell-check features. She further explained 
that millennials, as an emerging generation, often lack three essential skills: reading, writing, 
and metalinguistic awareness (Vodopija-Krstanović and Brala Vukanović 2015; Wood 2014). 
The study by Gayed et al. (2022) explored the impact of writing tools on L2 writing proficiency, 
highlighting broader implications of auto-correction technology for ESL learners’ language 
development. While the findings indicated a positive impact on syntactic complexity, results 
for other measures and production rates were inconclusive. The authors attributed these 
inconsistencies to several factors, including the participants’ limited experience with the 
software, low usage rates of the word suggestion feature, and the inherent limitations of 
machine-based assessment tools. They argued that automated systems might fail to capture 
nuanced writing features, such as contextual and structural errors, which human evaluators 
can better identify. These findings highlight important considerations for implementing 
auto-correction tools in ESL contexts. While such tools may support immediate syntactic 
improvements, their influence on deeper language development remains uncertain. 
Outcomes are shaped by factors such as technological familiarity, typing proficiency, and the 
ability to interpret corrections. The study emphasizes the need for comprehensive training 
and ongoing tool refinement. Furthermore, predictive text and word suggestions may distract 
less proficient learners, requiring closer evaluation. Thus, auto-correction technology, though 
promising, must be carefully aligned with learners’ needs and proficiency levels.
Sanchez (2023) studied how auto-correction tools impact students’ writing abilities across 
various dimensions, including vocabulary, syntax, and writing mechanics. The findings 
revealed marginal performance in student composition, suggesting limited mastery in 
key areas of writing. For instance, vocabulary assessment consistently yielded a marginal 
rating, indicating a dependency on auto-correction tools for word choice. Similar trends 
were observed in syntax and mechanics, where a sizeable portion of students scored poorly, 
indicating persistent grammatical and structural errors in their writing despite using these 
tools. This suggests that while auto-correction provides immediate corrective feedback and 
improves surface-level errors, it may inhibit learners from developing deeper linguistic skills. 
Furthermore, those participants often relied on auto-correction tools not just for error 
correction but also as idea generators and time-savers, reflecting a shift in focus towards 
ease and efficiency over language mastery. These findings underscore that auto-correction 
tools boost confidence and efficiency but may hinder authentic language learning and critical 
thinking when overused.
73ACADEMIC WRITING
Research indicates that integrating automated feedback with traditional teacher feedback 
improves ESL learners’ writing skills. A recent study on Turkish EFL students found that 
this combined approach significantly enhanced writing self-efficacy compared to traditional 
methods (Sari and Han 2024). This aligns with prior studies by Grimes and Warschauer 
(2010) and Sherafati et al. (2020), which revealed that automated writing evaluation (AWE) 
tools, by providing immediate, personalized feedback, promote self-efficacy and learner 
engagement. These systems enable students to practice without time or space constraints, 
thereby enhancing confidence and allowing for iterative revisions. However, mixed results were 
observed in other areas, such as self-regulated writing strategies and writing anxiety. While the 
combined feedback model did improve self-regulation, it did not significantly reduce anxiety 
levels, possibly because of the continued role of teacher evaluation. Despite these variations, 
the immediate and individualized feedback offered by AWE systems has been shown to 
improve writing performance and facilitate more efficient error correction, suggesting that 
this hybrid feedback model could contribute positively to language development in ESL 
contexts. This supports the broader argument that automated feedback can foster a more 
student-centred and effective learning environment, enhancing both writing proficiency and 
psychological factors critical to language acquisition.
Despite the existing empirical evidence, the limitations in scope and narrow focus of previous 
studies underscore the requirement to explore emerging trends that are associated with 
technological advancements and their impact on educational practices. A significant concern 
that necessitates this study is the overemphasis on Middle Eastern and Western contexts in 
the literature (Ali et al. 2022; 2022; Benali 2021; Omer Ismael et al. 2022; Wood 2014), 
leaving a gap in research that addresses the Asian context, where findings could be applicable 
across the broader ESL landscape. Moreover, the post-COVID-19 learning environment in 
developing countries has introduced significant changes in educational practices, policies, 
and technology, leading to increased reliance on technology in the teaching-learning process 
(Bećirović, Brdarević-Čeljo, and Delić 2021). This over-reliance on technology has created 
a detectable phenomenon in millennials (Shadiev and Wang 2022), which needs immediate 
exploration. Therefore, this study is timely, as it aims to address this research gap and identify 
potential negative consequences of neglecting these issues, contributing to both academic 
discourse and socially significant outcomes for future generations. 
3 Methodology
3.1 Setting 
This study examines the impact of auto-correction features of text-processing applications 
on ESL undergraduates’ English language competences. A pre-designed test was deployed 
as the research instrument to gather evidence (Ahmed 2024). Furthermore, a series of semi-
structured interviews with both undergraduate and postgraduate clusters was conducted to 
ensure the validity and reliability of the data and to obtain more precise findings (Leung 
2015). The study was conducted as a cross-sectional investigation (Maier et al. 2023), in 
which data were collected at a specific point in time.
74 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features  in Text-Processing Software on the Academic ...
3.2 Participants
This experimental study involved 197 undergraduates from a state-governed university in Sri 
Lanka who regularly engage in academic writing (assignment submissions, end-of-semester 
examinations, field report composition, spot tests, etc.). The postgraduate cluster, used for 
the semi-structured interview series, included twenty postgraduate students. 
The undergraduate cluster of the sample comprised Year I Semester I (here onwards, the 
term YI SI will be used) and Year III Semester II (here onwards, the term YIII SII will be 
used) undergraduates from the discipline of spatial sciences. The participants were selected to 
preserve the validity  and reliability (Leung 2015) of the data. 
According to Table 1 below, these undergraduates aged between 19 and 26, are pursuing their 
Bachelor of Science in Spatial Sciences, representing both genders. Most of the sample had 
limited and inconsistent exposure to computers and the internet prior to entering university 
(Lekamge and Rajavarathan 2024). The subject General English is offered as part of the 
Advanced Level examination. However, the results of General English were not considered 
for university admission in this degree program, leading to significant variation in language 
competency among undergraduate students.
Table 1. Composition and demography of undergraduates who were tested for the experiment 
and postgraduates who were tested for semi-structured interviews.
Year of Study Cluster 1 - Undergraduates Cluster 2 – Postgraduates
Year I Semester I Year III Semester II Postgraduates
Age 19–22 22–26 28–40
Gender Male – 71
Female – 28
Male – 66
Female – 32
Male – 13
Female – 07
Mother Tongue Sinhala – 78
Tamil – 21
Sinhala – 82
Tamil – 16
Sinhala – 16
Tamil – 04
The Sri Lankan secondary education system predominantly relies on face-to-face learning and 
a mother tongue (MT/L1) based teaching-learning process (Lekamge, Jayathilake, and Smith 
2024), contributing to a low level of computer literacy and English language proficiency for 
some students in the initial phase of their university education. The students come from two 
different first-language (L1) backgrounds, as the majority speak Sinhala (L1), and the others 
Tamil (L1) (Prasangani 2018). To ensure consistency in instruction, the researcher served as the 
teacher for both the undergraduate and postgraduate clusters. Informed consent was obtained 
from all students before they participated, in accordance with research ethics (Wu et al. 2019).
3.3 Development of the Research Instruments and Data Collection 
Once the preliminary list of errors was identified from existing literature (Hládek, Staš, and 
Pleva 2020; Nejja and Yousfi 2015), the types of errors were identified and listed. Then, to 
ensure triangulation for validity and reliability, a pilot study was conducted through an online 
platform with a randomly selected group of ten undergraduates and five postgraduates. The 
pilot study was vital to ensure the responsiveness and applicability of the tested items (Reed 
75ACADEMIC WRITING
et al. 2021; Vivek, Nanthagopan, and Piriyatharshan 2023). Then, the finalized error list 
was composed, and each minute error type was classified under a sub-category for ease of 
handling the results. The required terminology and possible error types were included in the 
tested items. Further, the postgraduate cluster was questioned with reference to the impact 
caused by long-term use of this software. 
A. On-site test designed to gauge the real-time impact of auto-correction on writing
The first research instrument was the pre-designed test targeting the undergraduates, which 
consisted of three tasks divided into two subcategories. 
Task one was designed to assess the hand-written competency of students in academic 
activities. The tasks aimed to assess student competence in listening to academic content 
and composing relevant notes, addressing potential errors identified in the pilot study. This 
included terminology that was subject to auto-correction and technical terms specific to the 
academic discipline of spatial sciences. The students had to compose a 200-word paragraph 
within the allocated time for each task. 
Task two was a computer-based test, which was completed in a notepad application with no 
auto-correction options. Students were given two parts under the second task and asked to 
submit the saved notepad answer to the link provided at the end of task two.
Task three was designed to be completed in a Microsoft Word document (hereafter, a Word 
document), with auto-correction features. The task had two parts, requiring students to type 
their answers in the Word document. Each task carried the same weight as previous tasks and 
was related to the main academic discipline of spatial science. All three parts of this test were 
designed to include the required terms that helped assess the spelling and grammar concerns 
that were tested in the experiment (Catelly 2014; Hair et al. 2024; Ren and Seedhouse 2024). 
The test design is displayed in Figure 1 below.
3.3 Development of the research instruments and data collection  
Once the preliminary li t of errors was identified from existing literature (Hládek, Staš, and Pleva 2020; Nejja and Yousfi 
2015), the types of errors were identified and listed. Then, to ensure triangulation for validity and reliability, a pilot 
study was conducted through an online platform with a randomly selected group of ten undergraduates and five 
postgraduates. The pilot study was vital to ensure the responsiveness and a plicability of the tested items (Reed et al. 
2021; Vivek, Nanthagopan, and Piriyatharshan 2023). Then, the finalized error list was composed, and each minute 
error type was classified under a sub-category for ease of handling the results. The required terminology and possible 
error types were included in the tested items. Further, the postgraduate cluster was questioned with reference to the 
impact caused by long-term use of this software.  
A. On-site test designed to gauge the real-time impact of auto-correction on writing 
The first research instrument was the pre-designed test targeting the undergraduates, which consisted of three tasks 
divided into two subcategories.  
Task one was designed to assess the hand-written competency of students in academic activities. The tasks aimed to 
assess student competence in listening to academic content and composing relevant notes, addressing potential errors 
identified in t e pilot study. This included terminology that was subject to auto-correct on and technical terms specific 
to the academic discipline of spatial sciences. The students had to compose a 200-word paragraph within the allocated 
time for each task.  
Task two was a computer-based test, which was completed in a notepad application with no auto-correction options. 
Students were given t o parts under the second task and asked to submit the saved notepad answer o the link p ovided 
at the end of task two. 
Task three was designed to be completed in a Microsoft Word document (hereafter, a Word document), with auto-
correction features. The task had two parts, requiring students to ty e their answers in the Word document. Each task 
carried the same weight as previous tasks and was related to the main academic discipline of spatial science. All three 
parts of this test were designed to include the required terms that helped assess the spelling and grammar concerns that 
were tested in the experiment (Catelly 2014; Hair et al. 2024; Ren and Seedhouse 2024). The test design is displayed in 
Figure 1 below. 
 
 
FIGURE 1. Design of the test. 
Figure 1. Design of the test.
B. Semi-structured interviews targeting the postgraduate cluster
A series of semi-structured interview was conducted with twenty postgraduates who were 
willing to participate in an online session of 10-15 minutes. Qualitative interviews carry 
higher validity in triangulating the data and obtaining authentic viewpoints and experiences 
relevant to the study (DeJonckheere and Vaughn 2019; Magaldi and Berler 2020). Thus, 
the topic areas for the interview were designed with extensive reference to and adaptation 
76 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features  in Text-Processing Software on the Academic ...
of prevailing literature. Accordingly, the interview questions covered attitudes toward auto-
correction and its impact on academic writing, language learning, and development. It also 
addressed the accuracy and common errors of auto-correction, its effect on writing style and 
tone, the availability and compatibility of platforms, challenges and limitations faced by ESL 
learners, its impact on confidence in English language usage and writing, and long-term 
changes in writing practices (DeJonckheere and Vaughn 2019; Ranalli and Yamashita 2022; 
Sanosi 2022; Wei, Wang, and Dong 2023).
3.4 The Procedure of the Study
The first research instrument (on-site test) occupied seven major steps. Initially, a suitable 
testing approach was designed for the experiment with three tasks to test each aspect of the 
research. To obtain the expected outcomes, the test was divided into three sections. Each 
section had its own significance to the main objective of the study: 
a) The first part involved handwritten tasks to assess students’ on-site writing 
competence without technology support. This aimed to reveal their independent 
language proficiency and confidence, isolating language skills from computer literacy 
and typing speed.
b) The second part involved notepad-typed content to assess students’ language errors 
without auto-correction support, though results could be influenced by typing speed 
and computer literacy.
c) The third part involved a composition typed in Microsoft Word, which offers auto-
correction, to identify language issues beyond the software’s corrective capabilities.
These three test sections allowed the researcher to identify gaps at each phase, with comparisons 
revealing how students increasingly rely on technology-enhanced tools without developing 
independent language competences. For the on-site test, participants were selected from YI 
SI and YIII SII undergraduates. The interview series targeted postgraduate students with 
extensive experience using text-processing software for academic, professional, and research 
purposes, aiming to explore the long-term effects of continuous software exposure and to 
enhance the validity of on-site test results (Kakarash 2023; Morse et al. 2002; Van der Loo 
and de Jonge 2020).
The majority of YI SI students are not well exposed to computer literacy and related 
technology (De Silva, Kodikara, and Somarathne 2014; Gamage and Halpin 2007; Lekamge 
and Rajavarathan 2024). The YI SI group was selected for their limited exposure to auto-
correction and text-processing software, while postgraduate students, with the longest 
exposure, provided stronger validation for the findings. The analysis focused on how 
continuous software exposure affects writing performances of ESL learners. YIII SII students, 
with considerable exposure to online learning and text-processing tools, offer an intermediate 
perspective. Thus, the three clusters ensure data accuracy across varying stages of technological 
exposure  (Van der Loo and de Jonge 2020). 
The third phase involved task preparation, obtaining approvals, and informing students of 
test procedures. Postgraduate interviewees were notified two weeks in advance, briefed on 
77ACADEMIC WRITING
question areas, and invited to participate voluntarily. The fourth and most challenging phase 
was on-site test administration, conducted by three instructors. Instructions were clearly 
communicated at the beginning and end of each task, with specific time allocations. Upon 
completing each task, students submitted responses via a Google Form. Collected scripts 
were then reviewed, and purposeful sampling was used to select complete submissions from 
ninety-eight participants (Naderifar, Goli, and Ghaljaie 2017).
Researchers used content analysis to examine error patterns in each task of the undergraduate 
on-site test (Amnuai 2020; Salehi and Bahrami 2018). The interview scripts were first 
translated into English and then analysed using qualitative thematic analysis (Braun and 
Clarke 2006; Kiger and Varpio 2020; Naeem et al. 2023; Rosairo 2023; J. Singh and 
Eisenschenk 2021) to generate codes, which were later transformed into themes of the study. 
Finally, the data from the on-site test and interviews were manually examined to detect 
existing phenomena with supporting evidence. The data were qualitatively analysed through 
error analysis and qualitative thematic analysis. Error types were identified in each test phase, 
and the findings were compared using descriptive thematic analysis (Baxter 1991; Braun and 
Clarke 2006; Kiger and Varpio 2020). 
4 Discussion and Analysis
The discussion and analysis section is organized with two primary concerns: 
(i) to establish the influence of the automatic spelling correction feature on writing 
skills of ESL context English language learners and 
(ii) to assess the impact of identifying misspelt words and providing suggestions and the 
role of grammatical suggestions, as displayed in Table 2. 
The primary features offered by the software (MS Word) are as follows: automatic spelling 
correction feature and grammar correction, as well as suggestions for misspellings, vocabulary, 
stylistic issues, and other writing errors.
Based on the overall performance of the participants, the study identified various persistent 
error types at each phase of the analysis. These error types were categorized into subcategories 
and main categories to facilitate data processing and analysis (Figure 2). These categories are 
as follows:
(a)  grammar errors – subject-verb agreement, tense consistency, pronoun reference, and 
sentence structure; 
(b)  spelling errors – grapheme, transposition, substitutions, insertions, omissions, case 
sensitivity, double letters, and homophones; 
(c)  punctuation errors – misplaced or missing commas, apostrophe errors, and missing 
periods; 
(d)  vocabulary errors – incorrect word usage, redundancy, clichés, and jargon; 
(e)  stylistic errors – lack of clarity, tone and register issues, and insufficient sentence 
variety; 
(f)  other errors – repetition, paragraph structure issues, improper citations, and 
incompleteness.
78 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features  in Text-Processing Software on the Academic ...
 
FIGURE 2. Identified error categories among the sample (YI SI and YIII SII undergraduates). 
Since the focus of this study is the impact of auto-correction features in text-processing applications on the language 
development of learners in the ESL context, unrelated concerns and error types were excluded from analysis. 
4.1 Impact of Automatic Correction on Language Performance 
The comparison of three tasks (hand-written, notepad-based, and Word document-based) revealed certain limitations in 
the automatic spelling correction. Specifically, the software does not correct all spelling errors made by users, as it relies 
on a pre-installed auto-correction word list, which reveals the limited effectiveness of the software (Fitria 2021). While 
software often corrects mistakes automatically without user awareness, this feature poses drawbacks for long-term users 
(Hiscox, Leonavičiūtė, and Humby 2014). Over time, their attention to key language aspects, such as spelling and 
punctuation, may diminish (Fan and Ma 2022). These findings will be further validated by the postgraduate sample. All 
postgraduate interview participants concurred that prolonged and continuous exposure to text-processing software 
significantly contributed to spelling and punctuation errors when operating in non-technological environments, as it 
made them heavily dependent on the technology. These elements are crucial in academic writing, underscoring the 
profound impact of long-term software use (Brenner et al. 2021; Merzi̇Fonluoğlu and Takkaç Tulgar 2023). Table 2 
illustrates compelling evidence from graduates who experienced serious effects from auto-correction features over time. 
Notably, these participants occupy a unique transitional period, having witnessed the introduction of technology into 
education. 
Graduates who pursued their undergraduate studies in the late 20th century and early 21st centuries had substantial 
engagement with hard copies of books/documents and manual notetaking using pen and paper. This method indirectly 
fostered their cognitive and language development processes (Baker 1994; Dickinson et al. 2012). In contrast, the 
evidence presented by the sample highlights a clear distinction between the pre- and post-effects of reliance on text-
processing tools. The Sri Lankan primary and secondary education system, which traditionally emphasized physical, 
hand-written work, contributed positively to students’ spelling and language accuracy, which incorporated metacognitive 
concerns (Baker 1994). However, as students progressed into tertiary education and gained more exposure to digital 
technology, the anticipated language improvements stagnated or declined. 
0
20
40
60
80
100
Su
bj
ec
t-V
er
b 
A
gr
ee
m
en
t
Te
ns
e 
co
ns
is
te
nc
y
Pr
on
ou
n 
re
fe
re
nc
e
Se
nt
en
ce
 st
ru
ct
ur
e
G
ra
ph
em
e
Tr
an
sp
os
iti
on
Su
bs
tit
ut
io
ns
In
se
rti
on
s
O
m
iss
io
n
Ca
se
 se
ns
iti
vi
ty
D
ou
bl
e 
le
tte
rs
H
om
op
ho
ne
s
M
isp
la
ce
d 
or
 m
iss
in
g…
A
po
str
op
he
 e
rro
rs
N
o 
pe
rio
d
In
co
rre
ct
 w
or
d 
us
ag
e
Re
du
nd
an
cy
Cl
ic
he
s a
nd
 Ja
rg
on
La
ck
 o
f c
la
rit
y
To
ne
 a
nd
 re
gi
ste
r
Se
nt
en
ce
 v
ar
ie
ty
Re
pe
tit
io
n
Pa
ra
gr
ap
h 
str
uc
tu
re
Im
pr
op
er
 c
ita
tio
ns
In
co
m
pl
et
e
Grammar error
types
Spelling error type Punctuation
error types
Vocabulary
error types
Stylistic
error types
Other error types
YISI Mean value YIIISII Mean Value
Figure 2. Identified error categories among the sample (YI SI and YIII SII undergraduates).
Since the focus of this study is the impact of auto-correction f atures in text-processing 
applica ions n he language development of l arners in the ESL context, unrelated concerns 
and error types were excluded from analysis.
4.1 Impact of Automatic Correction on Language Performance
The comp rison of three tasks (hand-wri t n, notepad-based, nd Word docume t-based) 
revealed certain limitations in the automatic spelling correction. Specifically, the software 
does not correct all spelling errors made by users, as it relies on a pre-installed auto-correction 
word list, which reveals the limited effectiveness of the software (Fitria 2021). While software 
often corrects mistakes automatically without user awareness, this feature poses drawbacks 
for long-term users (Hiscox, Leonavičiūtė, and H mby 2014). Over tim , their attention 
t  key language a pe ts, s ch as spelling an pu ctuation, may imi ish (Fan and Ma 
2022). These findings will be further validated by the postgraduate sample. All postgraduate 
intervie  participants concurred that prolonged and continuous exposure to text-processing 
software significantly contributed to spelling and punctuation errors when operating in 
non-technological environments, as it made them heavily dependent on the technology. 
These elements are crucial in academic writing, underscoring the profound impact of long-
term soft are use (Brenner t al. 2021; Merzi̇Fonluoğlu and Takkaç Tulg r 2023). Table 2 
illustrates compelling evidence from gra ua es who experienced s rious ffects from uto-
correction features over time. Notably, these participants occupy a unique transitional period, 
having witnessed the introduction of technology into education.
Graduates who pursued their undergraduate studies in the late 20th century and early 21st 
centuries had substantial engagement with hard copies of books/documents and manual 
notetaking using pen and paper. This method indirectly fostered their cognitive and language 
development processes (Baker 1994; Dickinson et al. 2012). In contrast, the evidence 
presented by the sample highlights a clear distinction between the pre- and post-effects of 
79ACADEMIC WRITING
reliance on text-processing tools. The Sri Lankan primary and secondary education system, 
which traditionally emphasized physical, hand-written work, contributed positively to 
students’ spelling and language accuracy, which incorporated metacognitive concerns (Baker 
1994). However, as students progressed into tertiary education and gained more exposure to 
digital technology, the anticipated language improvements stagnated or declined.
A comparative analysis of hand-written content between YI SII students and YIII SII students 
provided additional insights. The YI SII cohort, with less exposure to text-processing software, 
had fewer errors in spelling and punctuation. The reason is the impact of the traditional 
learning mode during their school years, with more focus on pen-and-paper based writing 
(Vičič 2020), extracting and learning. In contrast, the YIII SII students, who relied extensively 
on technology over an extended period (three or more years in the university during the 
COVID-19 pandemic), showed no marked improvement in their hand-written assessments. 
This pattern suggests that extensive dependence on auto-correction tools diminishes the 
language awareness of students, leading to persistent errors that hinder academic writing 
precision (Baker 1994).
Table 2. Themes developed from the semi-structured interviews with postgraduate students.
Theme Interview responses 
Limitations 
and challenges 
in using auto-
corrections in 
text-processing 
applications
A. sometimes, we do work or research on a specific field, … But when we insert some 
technical terms, the Word app does not accept those and automatically provides 
some other words with a different meaning …
B. Because scientific terms, technical terms and other subject-specific acronyms are 
not provided in Word, so we need to add those terms into the dictionary manually 
and again, these terms get inter-mingled with other terms later, and sometimes 
it automatically corrects the terms with other words according to its pre-installed 
rules. 
C. Word document has no connection with the writing style and the tone of the 
composition. The Word software does not address these stylistic concerns … 
Coping 
mechanisms 
for overcoming 
limitations
D. Hmm … yes, I think I have lost my confidence in spelling when compared to 
school days and undergraduate days. During my undergraduate years, I am talking 
nearly 15 years back, okay? We did everything manually, drafted maps manually, 
wrote our field reports manually, submitted our assignments manually, composed 
tutorials, and reports manually, referred to hard copies of books and documents, 
noted things manually and referred to hard copies of books in the library. From all 
those activities, I was so confident about my language, which improved drastically 
once I started my undergraduate degree, and reading the prescribed books in the 
library and noting and extracting things manually seriously impacted my language 
development … But once I started doing my master’s and other research and 
academic works with technology, let’s say after 2010 onwards, I strongly used 
computers and used softcopies of documents and started copying and pasting, 
which is easier and more efficient but has drastically impacted my language 
competency, … I feel like I am stuck and not any development … impact of 
software must have contributed a dominant proportion to this dependency.
E. … Thus, I use the Word software always, but now I use AI tools more often to 
finetune my language and writing style. I would draft the initial content in a Word 
document and then use the available AI tools to enhance the quality of the writing 
style and omit the grammatical errors.
80 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features  in Text-Processing Software on the Academic ...
Impact of 
text-processing 
software on 
language 
development 
among ESL 
context users
F. rather than for language development, it makes it easier and more efficient to 
handle the situation. Actually, we do not usually focus on language development; 
… But … we identify that some sort of grammatical suggestions provided in the 
word app make us aware of our mistakes. For some users, this can be helpful in 
identifying that they have grammar mistakes. I don’t think it is a successful way to 
develop language skills.
G. it has an impact on language development, but it is very limited, right? It provides 
some suggestions and corrections in some instances, but that is not sufficient and 
a good way to proceed. But, if someone is intentionally trying to learn through 
every minute instance as an opportunity to develop their language, then this can be 
considered as a sufficient way to develop language skills. But I don’t know whether 
Word can cover all grammar concerns, especially when it comes to complex, longer 
sentences. However, for learners at the beginner level, to compose a composition 
with fewer errors, Word is beneficial.
H. I believe that Word automatically corrects major grammar errors like subject-verb 
agreement, and spelling errors, capitalization stuff and period related concerns, but 
this automatic correction might harm learners because the user has no idea whether 
he has written correctly because prior to detection, the software itself corrects the 
error. However, I admit that the suggestions provided by Word for grammar and 
spelling concerns have a severe impact on learning language. Because the suggestion 
or explanation can lead the user towards a better understanding of the language. On 
the other hand, automatic correction can lead users towards lack of awareness about 
writing mechanics that are mandatory for academic concerns.
I. However, auto-correction …, is not a good thing. Because, I personally must admit 
that the worst scenario is my experience: I once participated in a spelling competition 
during my school days, and I was very good at English compared to my colleagues 
back then. However, now, the changes that have happened to me are actually very 
bad. Now I am not confident of my spelling capacity, and I always recheck it with 
an available online tool. Even though working as an academic and a researcher, I am 
continuously engaging with text processors. But I recognized that whenever I have to 
write something using pen and paper, I get stuck with spelling, seriously.
Figure 4 illustrates the most relevant error types that the automatic correction feature directly 
addresses, often without user awareness. These error types include spelling and punctuation. 
The most significant sub-categories are omission errors, case sensitivity issues, double letters 
in spelling, misplaced or missing commas, apostrophes, and the absence of periods. The 
postgraduate cluster provides evidence that continued exposure to text processors seriously 
affects the writing mechanics of users, as is evident in the answers of Interviewees H and I 
in Table 2. Accordingly, the long-term exposure of this middle-aged cluster, who experienced 
the transition from the conventional mode of learning to the flipped mode after the pandemic, 
indicated a deterioration in their spelling confidence due to prolonged entanglement with 
technology-enhanced writing tools (Reed et al. 2021; Wei, Wang, and Dong 2023).
When comparing the three tests, most errors were found in tasks completed using the notepad, 
possibly owing to typing speed, technical literacy, and related challenges (Van Waes et al. 2021). 
Comparison of handwritten content with text-processed content showed a significant reduction 
in most error types, particularly no-period errors, case sensitivity issues, and omissions (Figure 
3). This suggests that text-processing software has a positive short-term impact on producing 
more accurate written work (Van Der Steen, Samuelson, and Thomson 2017). However, its 
81ACADEMIC WRITING
long-term use negatively affects human autonomy, language competence and user confidence 
(Bickmore and Picard 2005; Schaefer et al. 2016).
Year-based analysis highlights the significant impact of punctuation errors. As shown in 
Figure 3, punctuation errors were notably reduced in text-processed content compared to 
handwritten and notepad tasks. However, many third-year undergraduates continued to 
exhibit punctuation errors in handwritten work, likely because of prolonged reliance on 
automatic correction, which conceals such mistakes (Van Waes et al. 2021). Consequently, the 
quality of their hand-written work shows minimal improvement over that of newly enrolled 
undergraduates. A key factor underlying this trend is the foundational cognitive development 
that occurs during school years. Activities that integrate traditional methods of writing played a 
critical role in establishing robust language mechanics (Kellogg 2008). However, the transition 
to university-level education, particularly during the COVID-19 pandemic, drastically altered 
the learning environment. Conventional, in-person educational practices in Sri Lanka were 
abruptly replaced by rigid online modes, causing a significant disruption without sustainable 
curricular adjustments and required facilities (Lekamge and Rajavarathan 2024). This 
shift fostered heavy dependence on digital tools, leaving students without essential skills in 
traditional writing practices and increasingly reliant on technology for language functions.
The results suggest that prolonged reliance on auto-correction may delay language learners’ 
ability to produce error-free compositions, diminishing their innate writing accuracy over 
time, as reflected in the persistence of minor errors (Omer Ismael et al. 2022). This finding is 
further supported by the postgraduate cluster, who reported losing confidence in producing 
error-free content. The long-term impact of heavy technological dependence, particularly 
the auto-correction feature in text-processing applications, negatively affects user confidence 
(Bickmore and Picard 2005; Schaefer et al. 2016).
accurate written work (Van Der Steen, Samuelson, and T omson 2017). However, its long-term use negatively 
affects huma  autonomy, language compete ce a d user confidence (Bickmore and Picard 2005; Schaefer et al. 2016). 
Yea -based analysis highlights the significant impact of pu ctuation errors. As shown in Figure 3, punctuation errors 
were notably reduced in text-processed content compared to handwritten and otepad tasks. However, many third-year 
undergraduates continued to exhibit punctuation errors in handwritten work, likely because of prolonged reliance on 
automatic correction, which conceals such mistakes (Van Waes et al. 2021). Consequently, the quality of their hand-
written work shows minimal improvement over that of newly enrolled undergraduates. A key factor underlying this 
trend is the foundational cognitive development that occurs during school years. Activities that integrate traditional 
methods of writing played a critical role in establishing robust language mechanics (Kellogg 2008). However, the 
transition to university-level education, particularly during the COVID-19 pandemic, drastically altered the learning 
environment. Conventional, in-person educational practices in Sri Lanka were abruptly replaced by rigid online modes, 
causing a significant disruption without sustainable curricular adjustments and required facilities (Lekamge and 
Rajavarathan 2024). This shift fostered heavy dependence on digital tools, leaving students without essential skills in 
traditional writing practices nd increasingly reliant on technology for la guag  functions. 
 
 
FIGURE 3. Detected errors (by year): How exposure to technology has affected the language performance of 
undergraduates. 
The results suggest that prolonged reliance on auto-correction may delay language learners’ ability to produce error-free 
compositions, diminishing their innate writing accuracy over time, as reflected in the persistence of minor errors (Omer 
Ismael et al. 2022). This finding is further supported by the postgraduate cluster, who reported losing confidence in 
producing error-free content. The long-term impact of heavy technological dependence, particularly the auto-correction 
feature in text-processing applications, negatively affects user confidence (Bickmore and Picard 2005; Schaefer et al. 
2016). 
54
19
11 24
71
58
12
37
90
53
53
0
20
40
60
80
100
G
ra
ph
em
e
T
ra
ns
po
sit
io
n
Su
bs
tit
ut
io
ns
In
se
rt
io
ns
O
m
iss
io
n
C
as
e 
se
ns
iti
vi
ty
D
ou
bl
e 
le
tte
rs
H
om
op
ho
ne
s
M
isp
la
ce
d 
or
 m
iss
in
g
co
m
m
as
Ap
os
tr
op
he
 e
rr
or
s
N
o 
pe
rio
d
Spelling errors Punctuation errors
YISI Handwritten YISI Notepad YISI Word
YIIISII Handwritten YIIISII Notepad YIIISII Word
Figure 3. Detected errors (by year): How exposure to technology has affected the language 
performance of undergraduates.
82 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features  in Text-Processing Software on the Academic ...
4.2 Impact of Identifying and Emphasizing Errors to the User
The second feature of text-processing applications highlights incorrect words, grammatical 
issues, and other language concerns, offering suggestions for correction (Kukich 1992; S. 
Singh and Singh 2018). According to results, this feature covers a broad range of errors: 
grammar rule-related errors, spelling errors, vocabulary-related errors, and other error types. 
This feature positively impacts language learners in the ESL context by identifying areas that 
need improvement and raising awareness of their mistakes (Wood 2014). This capability is 
useful in highlighting errors in answer scripts, allowing learners to develop their language 
skills through cognitive engagement (Pinet and Nozari 2022; Rüdian, Dittmeyer, and 
Pinkwart 2022; Sherafati, Largani, and Amini 2020). When users click on a highlighted 
error, the software offers accurate suggestions, aiding conscious correction. Users can also 
add technical terms to the dictionary or auto-correction list, benefiting academic writers by 
allowing them to focus more on content rather than on language errors (Khansir 2012). 
However, the postgraduate cluster revealed that the software often fails to recognize technical 
terms and acronyms, automatically replacing these with similar-looking alternatives (Table 
2 – Interviewees A and B). To avoid this, users must manually add scientific terms to the 
software dictionary (Cook and Jensen 2019; Salton and Lesk 1968). However, manual 
additions are not feasible for all users on account of their varying levels of comfort with 
technology, and the fact that continuous dictionary updates are time-consuming. 
However, the feature serves as a positive catalyst for language learners in ESL contexts, 
as it integrates cognitive processes with the task (Goonawardena et al. 2022; Rüdian, 
Dittmeyer, and Pinkwart 2022; Wood 2014). By purposefully correcting highlighted 
concerns, users experience a significant impact on their cognitive processes, leading to a 
conscious awareness of errors (Ellis et al. 2008; Pinet and Nozari 2022). Additionally, the 
provision of grammar suggestions enhances learners’ theoretical understanding of grammar 
rules (Ellis et al. 2008; Ji and Liu 2018). Consequently, this approach enhances language 
accuracy by reducing the potential harm caused by the software to language users by its 
suggestion provider feature. 
For further clarification, twenty answer scripts were randomly chosen for review based on 
mistakes highlighted by the Word software, as displayed in Figure 5. Regardless of study 
phase, most errors were in spelling (above 300 occurrences) and punctuation (above 250 
occurrences). Thus, a significant finding is that the software fails to identify stylistic aspects 
and some vocabulary errors (Putze et al. 2017; Rüdian, Dittmeyer, and Pinkwart 2022; 
Shadiev and Wang 2022). MS Word does not address writing style, paragraph structure, 
coherence, or cohesion. Long-term users of the software report that they incorporate larger-
scale AI tools to enhance their writing style, coherence, and cohesion (Tica and Krsmanović 
2024). This clearly prompts software developers to enhance the features in text-processing 
software (Alharbi 2023; Benali 2021; Gayed et al. 2022; Jajić Novogradec 2021; Ranalli and 
Yamashita 2022; Salton and Lesk 1968). 
Compared to the Word document-based task, YIII SII students showed a reduction in 
error occurrences, with the greatest decrease in other error types such as repetition and 
incomplete terms. Punctuation errors (apostrophe and comma issues) followed in reduction, 
83ACADEMIC WRITING
while grammar errors, including tense and subject-verb agreement, showed the third most 
significant improvement  (Fan and Ma 2022; Ferris and Roberts 2001). However, spelling 
errors show the lowest difference across each year of study. That hinders the proficiency of 
students in technical and subject-related terminology. These terminology-related spelling 
error concerns are common in their hand-written content and are displayed in Figure 5. 
As shown in Figure 5, most errors were found in the YI SI cluster. However, YIII SII students 
committed notably more punctuation errors in the handwritten task, with a count of 221, 
while other error types were lower compared to the YI SI cluster.
Apart from that, stylistic errors were not detected by the text-processing software (Gröndahl 
and Asokan 2020; Stamatatos, Fakotakis, and Kokkinakis 2000), resulting in no recorded 
values for these errors in Figure 4. However, the researcher identified numerous stylistic 
errors in the hand-written content (Figure 5), particularly in the subcategories of clarity, 
tone, register, and sentence variety. 
Incorrect word usage was a recurring error in the sample. However, the software failed to 
detect it, because the pre-installed logic lacks the capacity to interpret semantic accuracy or 
assess the higher-level cognitive attributes involved in human language processing (Rüdian, 
Dittmeyer, and Pinkwart 2022; Salton and Lesk 1968; Wood 2014). Hence, this limitation 
highlights a significant gap in current text-processing applications. However, the most recent 
developments in artificial intelligence (AI) have begun to address this concern, as evidenced by 
the data obtained from the postgraduate cluster (Table 2). Another noteworthy observation 
is that the software did not identify issues related to paragraph structure and organization. 
These issues were highly apparent in the YI SI cluster but were significantly less prominent 
in the YIII SII cluster. This insight further supports the fact that, with gradual exposure to 
the language through CLIL (Content and Language Integrated Learning), students make 
noticeable progress in their language development. This valuable insight emerged as a by-
product of the study and is worthy of further exploration.
 
FIGURE 4. Number of occurrences of error types as highlighted on the task sheet (completed using text-processing 
software). 
As shown in Figure 6, most errors were found in the YI SI cluster. However, YIII SII students committed notably more 
punctuation errors in the handwritten task, with a count of 221, while other error types were lower compared to the YI 
SI cluster. 
0
50
100
150
200
250
300
350
Grammar error
types
Spelling error type Punctuation error
types
Vocabulary error
types
Stylistic error types Other error types
Number of occurrences YISI Number of occurrences YIIISII
Figure 4. Number f occurrences f error types as ighlight d on t e task sheet (compl ted using 
text-processing software).
84 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features  in Text-Processing Software on the Academic ...
Another significant finding is that students in the YI SI cluster demonstrate stronger language 
skills in their hand-written work (spelling basic terminology and punctuation use). As these 
students gradually gain exposure to subject-specific terminology and technical terminology, 
the frequency of errors decreases, as observed in the YIII SII cluster. In contrast, students in 
the YIII SII cluster exhibit a higher frequency of punctuation errors, likely due to continued 
reliance on text-processing software. The most noticeable effects of this reliance on text-
processing software become apparent during hand-written assessments, where automated 
support is unavailable. These findings are also reinforced by triangulated data from the 
postgraduate cluster, emphasizing the pervasive impact on writing mechanisms. The data 
illustrate that this issue is not isolated but a widespread concern that affects many users. 
Prolonged dependence on auto-correction features has resulted in a gradual decline in 
essential language competences, signalling an urgent need for intervention to preserve and 
enhance human cognitive abilities in language functions (Brenner et al. 2021; Huseinović 
2022; Merzi̇Fonluoğlu and Takkaç Tulgar 2023).
Another notable finding identified in the study is the recurrent use of abbreviated forms 
and informal texting language in the hand-written content (Booton, Hodgkiss, and Murphy 
2023; Dwivedi et al. 2023; Genlott and Grönlund 2013; Jonsson and Blåsjö 2020) in both 
clusters. This phenomenon adversely affects their academic writing, often introducing an 
inappropriately informal tone into formal writing samples. 
The evidence indicates that text-processing applications exert both positive and negative 
influences on the academic language development of ESL learners (Alharbi 2021; Mahapatra 
2024). Thus, the current study highlights the importance of language instructors and 
educational practitioners strategically leveraging the beneficial aspects of these tools to 
enhance the linguistic competence of learners. Simultaneously, it suggests that software 
developers should incorporate features that actively engage cognitive processes, promoting 
more effective language development rather than focusing on the efficiency of text production 
(Booton, Hodgkiss, and Murphy 2023; Jia et al. 2019; Khan et al. 2023).
Apart from that, stylistic errors were not detected by the text-processing software (Gröndahl and Asokan 2020; 
Stamatatos, Fakotakis, and Kokkinakis 2000), resulting in no recorded values for these errors in Figure 4. However, the 
researcher identified numerous stylistic errors in the hand-written content (Figure 5), particularly in the subcategories of 
clarity, tone, register, and sentence variety.  
Incorrect word usage was a recurring error in the sample. However, the software failed to detect it, because the pre-
installed logic lacks the capacity to interpret semantic accuracy or assess the higher-level cognitive attributes involved in 
human language processing (Rüdian, Dittmeyer, and Pinkwart 2022; Salton and Lesk 1968; Wood 2014). Hence, this 
limitation highlights a significant gap in current text-processing applications. However, the most recent developments in 
artificial intelligence (AI) have begun to address this concern, as evidenced by the data obtained from the postgraduate 
cluster (Table 2). Another noteworthy observation is that the software did not identify issues related to paragraph 
structure and organization. These issues were highly apparent in the YI SI cluster but were significantly less prominent in 
the YIII SII cluster. This insight further supports the fact that, with gradual exposure to the language through CLIL 
(Content and Language Integrated Learning), students make noticeable progress in their language development. This 
valuable insight emerged as a by-product of the study and is worthy of further exploration. 
 
 
FIGURE 5. Year-wise comparison of error occurrences: A comparison between the hand-written content and the text-
processing application-based content 
Another significant finding is that students in the YI SI cluster demonstrate stronger language skills in their hand-written 
work (spelling basic terminology and punctuation use). As these students gradually gain exposure to subject-specific 
terminology and technical terminology, the frequency of errors decreases, as observed in the YIII SII cluster. In contrast, 
students in the YIII SII cluster exhibit a higher frequency of unctuation errors, likel  due to continued reliance o  text-
processing software. The most noticeable effects of this relia ce on text-processing software become apparent during 
hand-written assessments, where automated support is unavailable. These findings are also reinforced by triangulated 
data from the postgraduate cluster, emphasizing the pervasive impact on writing mechanisms. The data illustrate that 
this issue is not isolated but a widespread concern that affects many users. Prolonged dependence on auto-correction 
features has resulted in a gradual decline in essential language competences, signalling an urgent need for intervention to 
111
321
287
21
0
7487
317
254
12 0
32
212 198 211
37
98
27
143 143
221
13
39
9
0
50
100
150
200
250
300
350
Grammar error
types
Spelling error type Punctuation error
types
Vocabulary error
types
Stylistic error types Other error types
Number of Occurrences (Word document) YISI
Number of Occurrences (Word document) YIIISII
Number of Occurrences (Handwritten document) YISI
Number of Occurrences (Handwritten document) YIIISII
igure 5. Year-wise comparison f e or occurrence : A comparison betw en the hand-written 
c nte t and the text-processing application-based content.
85ACADEMIC WRITING
5 Conclusion
The study reveals that text-processing software (MS Word) with features such as auto-
correction and error suggestions can provide valuable support in reducing certain language 
errors and increasing user focus on quality content. However, it also poses long-term 
challenges to language learners in the ESL context by making the users more dependent on the 
software and eventually diminishing writing mechanics like punctuation and spelling. Since 
the ESL context has no language input from the outer society, encounters and interaction 
with English language content are minimal. In such a context, modern technology offers an 
indirect path to obtaining English language content. Thus, potential software developments 
should address avenues to enhance language development. However, prolonged reliance on 
these tools can diminish student awareness of spelling and punctuation errors and delay the 
development of independent writing skills. Although the software enhances grammar and 
vocabulary accuracy, it fails to address stylistic and structural matters, which are crucial for 
academic writing. Moreover, informal language habits reinforced by frequent technology use 
can negatively impact academic writing quality. These findings suggest the need for balanced 
integration of technology in language education, emphasizing the importance of developing 
both technical accuracy and stylistic proficiency in ESL learners.
References
A. Al-Mutairi, Mohammad. 2019. “Kachru’s three concentric circles model of English language: An 
overview of criticism & the place of Kuwait in it.” English Language Teaching 13 (1): 85. https://doi 
.org/10.5539/elt.v13n1p85.
Ahmed, Sirwan Khalid. 2024. “The pillars of trustworthiness in qualitative research.” Journal of Medicine, 
Surgery, and Public Health 2: 100051. https://doi.org/10.1016/j.glmedi.2024.100051.
Ajaj, Israa Eibead. 2022. “Investigating the difficulties of learning English grammar and suggested methods 
to overcome them.” Journal of Tikrit University for Humanities 29 (6): 45–58. https://doi.org/10.251 
30/jtuh.29.6.2022.24.
Alharbi, Sultan H. 2021. “The struggling English language learners: Case studies of English language 
learning difficulties in EFL context.” English Language Teaching 14 (11): 108. https://doi.org/10.55 
39/elt.v14n11p108.
Alharbi, Wael. 2023. “AI in the foreign language classroom: A pedagogical overview of automated writing 
assistance tools.” Education Research International 2023: 1–15. https://doi.org/10.1155/2023/425 3331.
Ali, Hewa Fouad, Lisa Jamal Nakshbandi, Fatima Saadi, and Sami Hussein Hakeem Barzani. 2022. “The 
effect of spell-checker features on spelling competence among EFL Learners: An empirical study.” 
International Journal of Social Sciences & Educational Studies 9 (3): 101–11. https://doi.org/10.239 
18/ijsses.v9i3p101.
Amnuai, Wirada. 2020. “An error analysis of research project abstracts written by Thai undergraduate 
students.” Advances in Language and Literary Studies 11 (4): 13.
Baker, Linda. 1994. “Fostering metacognitive development.” Advances in Child Development and Behavior 
25: 201–39. https://doi.org/10.1016/S0065-2407(08)60053-1.
Balla, Ervin. 2023. “Impact of technology in acquisition of English language.” Journal of Educational and 
Social Research 13 (1): 134–45. https://doi.org/10.36941/jesr-2023-0012.
Baron, Dennis E. 2023. A Better Pencil: Readers, Writers, and the Digital Revolution. 1st ed. Oxford 
University Press.
Barrot, Jessie S. 2023. “Using automated written corrective feedback in the writing classrooms: Effects on 
L2 writing accuracy.” Computer Assisted Language Learning 36 (4): 584–607. https://doi.org/10.10 
80/09588221.2021.1936071.
86 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features  in Text-Processing Software on the Academic ...
Baxter, L.A. 1991. Content Analysis. The Guilford Press.
Bećirović, Senad, Amna Brdarević-Čeljo, and Haris Delić. 2021. “The use of digital technology in foreign 
language learning.” SN Social Sciences 1 (10): 246. https://doi.org/10.1007/s43545-021-00254-y.
Benali, Ameni. 2021. “The impact of using automated writing feedback in ESL/EFL classroom contexts.” 
English Language Teaching 14 (12): 189–95. https://doi.org/10.5539/elt.v14n12p189.
Bickmore, Timothy W., and Rosalind W. Picard. 2005. “Establishing and maintaining long-term human-
computer relationships.” ACM Transactions on Computer-Human Interaction 12 (2): 293–327. https://
doi.org/10.1145/1067860.1067867.
Booton, Sophie A., Alex Hodgkiss, and Victoria A. Murphy. 2023. “The impact of mobile application 
features on children’s language and literacy learning: A systematic review.” Computer Assisted Language 
Learning 36 (3): 400–429. https://doi.org/10.1080/09588221.2021.1930057.
Bozavli, Ebubekir. 2023. “The relationship between the use of technology and technology addiction in 
learning foreign language.” Arab World English Journal 14 (3): 418–30.  
https://doi.org/10.24093 /awej/vol14no3.27.
Braun, Virginia, and Victoria Clarke. 2006. “Using thematic analysis in psychology.” Qualitative Research 
in Psychology 3 (2): 77–101. https://doi.org/10.1191/1478088706qp063oa.
Brenner, Maria, Denise Alexander, Mary Brigid Quirke, Jessica Eustace-Cook, Piet Leroy, Jay Berry, 
Martina Healy, Carmel Doyle, and Kate Masterson. 2021. “A systematic concept analysis of 
‘technology dependent’: Challenging the terminology.” European Journal of Pediatrics 180 (1): 1–12. 
https://doi.org/10.1007/s00431-020-03737-x.
Catelly, Yolanda-Mirela. 2014. “Optimizing language assessment – Focus on test specification and 
piloting.” Procedia – Social and Behavioral Sciences 128 (April): 393–98. https://doi.org/10.1016/j.sb 
spro.2014.03.177.
Cook, Helen V., and Lars Juhl Jensen. 2019. “A guide to dictionary-based text mining.” In Bioinformatics 
and Drug Discovery, 3rd ed., edited by Richard S. Larson and Tudor I. Oprea, 73–89. Springer. 
https://doi.org/10.1007/978-1-4939-9089-4_5.
Cummings, Lance. 2023. “Writing processes in the digital age: A networked interpretation.” In Digital 
Writing Technologies in Higher Education, edited by Otto Kruse, Christian Rapp, Chris M. Anson, 
Kalliopi Benetos, Elena Cotos, Ann Devitt, and Antonette Shibani, 485–97. Springer International 
Publishing. https://doi.org/10.1007/978-3-031-36033-6_30.
De Silva, W. Indralal, Pamoda Kodikara, and Ruwani Somarathne. 2014. “Sri Lankan youth and their 
exposure to computer literacy.” Sri Lanka Journal of Advanced Social Studies 3 (1): 27–52. https://doi 
.org/10.4038/sljass.v3i1.7127.
DeJonckheere, Melissa, and Lisa M. Vaughn. 2019. “Semistructured interviewing in primary care research: 
A balance of relationship and rigour.” Family Medicine and Community Health 7 (2): e000057. https://
doi.org/10.1136/fmch-2018-000057.
Dickinson, David K., Julie A. Griffith, Roberta Michnick Golinkoff, and Kathy Hirsh-Pasek. 2012. “How 
reading books fosters language development around the world.” Child Development Research 2012: 
602807. https://doi.org/10.1155/2012/602807.
Dwivedi, Yogesh K., Nir Kshetri, Laurie Hughes, Emma Louise Slade, Anand Jeyaraj, Arpan Kumar Kar, 
Abdullah M. Baabdullah, et al. 2023. “Opinion paper: ‘So what if ChatGPT wrote it?’ 
Multidisciplinary perspectives on opportunities, challenges and implications of generative 
conversational AI for research, practice and policy.” International Journal of Information Management 
71:102642. https://doi.org/10.1016/j.ijinfomgt.2023.102642.
Ellis, Rod, Younghee Sheen, Mihoko Murakami, and Hide Takashima. 2008. “The effects of focused and 
unfocused written corrective feedback in an English as a foreign language context.” System 36 (3): 
353–71. https://doi.org/10.1016/j.system.2008.02.001.
Fan, Ning, and Yingying Ma. 2022. “The effects of automated writing evaluation (AWE) feedback on 
students’ English writing quality: A systematic literature review.” Language Teaching Research Quarterly 
28: 53–73. https://doi.org/10.32038/ltrq.2022.28.03.
87ACADEMIC WRITING
Ferris, Dana, and Barrie Roberts. 2001. “Error feedback in L2 writing classes.” Journal of Second Language 
Writing 10 (3): 161–84. https://doi.org/10.1016/S1060-3743(01)00039-X.
Fitria, Tira Nur. 2021. “Grammarly as AI-powered English writing assistant: Students’ alternative for 
writing English.” Metathesis: Journal of English Language, Literature, and Teaching 5 (1): 65–78. 
Gamage, Premila, and Edward F. Halpin. 2007. “E‐Sri Lanka: Bridging the digital divide.” The Electronic 
Library 25 (6): 693–710. https://doi.org/10.1108/02640470710837128.
Gayed, John Maurice, May Kristine Jonson Carlon, Angelu Mari Oriola, and Jeffrey S. Cross. 2022. 
“Exploring an AI-based writing assistant’s impact on English language learners.” Computers and 
Education: Artificial Intelligence 3:100055. https://doi.org/10.1016/j.caeai.2022.100055.
Genlott, Annika Agélii, and Åke Grönlund. 2013. “Improving literacy skills through learning reading by 
writing: The iWTR method presented and tested.” Computers & Education 67 (September): 98–104. 
https://doi.org/10.1016/j.compedu.2013.03.007.
Goonawardena, Mithma, Ashini Kulatunga, Raveena Wickramasinghe, Thisuraka Weerasekara, Hansi De 
Silva, and Samantha Thelijjagoda. 2022. “Automated spelling checker and grammatical error 
detection and correction model for Sinhala language.” In 2022 International Research Conference on 
Smart Computing and Systems Engineering (SCSE), Colombo, Sri Lanka, 184–89. IEEE. https://doi 
.org/10.1109/SCSE56529.2022.9905126.
Grimes, Douglas, and Mark Warschauer. 2010. “Utility in a fallible tool: A multi-site case study of 
automated writing evaluation.” The Journal of Technology, Learning and Assessment 8 (6): 4–42. https://
ejournals.bc.edu/index.php/jtla/article/view/1625.
Gröndahl, Tommi, and N. Asokan. 2020. “Text analysis in adversarial settings: Does deception leave a 
stylistic trace?” ACM Computing Surveys 52 (3): 1–36. https://doi.org/10.1145/3310331.
Hair, Joseph F., Pratyush N. Sharma, Marko Sarstedt, Christian M. Ringle, and Benjamin D. Liengaard. 
2024. “The shortcomings of equal weights estimation and the composite equivalence index in PLS-
SEM.” European Journal of Marketing 58 (13): 30–55. https://doi.org/10.1108/EJM-04-2023-0307.
Hiscox, Lucy, Erika Leonavičiūtė, and Trevor Humby. 2014. “The effects of automatic spelling correction 
software on understanding and comprehension in compensated dyslexia: Improved recall following 
dictation.” Dyslexia 20 (3): 208–24. https://doi.org/10.1002/dys.1480.
Hládek, Daniel, Ján Staš, and Matúš Pleva. 2020. “Survey of automatic spelling correction.” Electronics 9 
(10): 1670. https://doi.org/10.3390/electronics9101670.
Hu, Betsy Xiaoqiong, and Xianxing Jiang. 2011. “Kachru’s three concentric circles and English teaching 
fallacies in EFL and ESL contexts.” Changing English 18 (2): 219–28. https://doi.org/10.1080/1358 
684X.2011.575254.
Huseinović, Lamija. 2022. “The relationship between digital competency, learning styles and learners’ 
perception of traditional versus technology-assisted language learning.” MAP Education and 
Humanities 3 (1): 17–30. https://doi.org/10.53880/2744-2373.2022.2.3.17.
Jajić Novogradec, Marina. 2021. “Positive and negative lexical transfer in English vocabulary acquisition.” 
ELOPE: English Language Overseas Perspectives and Enquiries 18 (2): 139–65. https://doi.org/10.43 
12/elope.18.2.139-165.
Ji, Chunyi, and Qi’ang Liu. 2018. “A study on the effectiveness of English grammar teaching and learning 
in Chinese junior middle schools.” Theory and Practice in Language Studies 8 (11): 1553–58. https:// 
doi.org/10.17507/tpls.0811.24.
Jia, Jingdong, Xiaoying Yang, Rong Zhang, and Xi Liu. 2019. “Understanding software developers’ 
cognition in agile requirements engineering.” Science of Computer Programming 178: 1–19. https:// 
doi.org/10.1016/j.scico.2019.03.005.
Jonsson, Carla, and Mona Blåsjö. 2020. “Translanguaging and multimodality in workplace texts and 
writing.” International Journal of Multilingualism 17 (3): 361–81. https://doi.org/10.1080/14790718 
.2020.1766051.
Kakarash, Zana Azeez. 2023. “Why is data validation important in research?” ResearchGate. https://doi 
.org/10.13140/RG.2.2.34496.81920.
Kellogg, Ronald T. 2008. “Training writing skills: A cognitive developmental perspective.” Journal of 
Writing Research 1 (1): 1–26. https://doi.org/10.17239/jowr-2008.01.01.1.
88 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features  in Text-Processing Software on the Academic ...
Khan, Aashiq, Irum Zeb, Yan Zhang, and Tahir. 2023. “Impact of emerging technologies on cognitive 
development: The mediating role of digital social support among higher education students.” IJERI: 
International Journal of Educational Research and Innovation 20: 1–15. https://doi.org/10.46661 
/ijeri.8362.
Khansir, Ali Akbar. 2012. “Error analysis and second language acquisition.” Theory and Practice in Language 
Studies 2 (5): 1027–32. https://doi.org/10.4304/tpls.2.5.1027-1032.
Kiger, Michelle E., and Lara Varpio. 2020. “Thematic analysis of qualitative data: AMEE guide no. 131.” 
Medical Teacher 42 (8): 846–54. https://doi.org/10.1080/0142159X.2020.1755030.
Kim, Hye-Kyung. 2012. “The effectiveness of correcting grammatical errors in writing classes: An EFL 
teacher’s perspective.” International Journal of Literacy, Culture, and Language Education 1: 227–37. 
https://doi.org/10.14434/ijlcle.v1i0.26836.
Kontogiannis, Tom. 1999. “User strategies in recovering from errors in man–machine systems.” Safety 
Science 32 (1): 49–68. https://doi.org/10.1016/S0925-7535(99)00010-7.
Kruse, Otto, and Christian Rapp. 2023. “Word processing software: The rise of MS Word.” In Digital 
Writing Technologies in Higher Education, edited by Otto Kruse, Christian Rapp, Chris M. Anson, 
Kalliopi Benetos, Elena Cotos, Ann Devitt, and Antonette Shibani, 15–32. Springer International 
Publishing. https://doi.org/10.1007/978-3-031-36033-6_2.
Kukich, Karen. 1992. “Techniques for automatically correcting words in text.” ACM Computing Surveys 24 
(4): 377–439. https://doi.org/10.1145/146370.146380.
Larsson, Anthony, and Robin Teigland, eds. 2020. The Digital Transformation of Labor: Automation, the Gig 
Economy and Welfare. Routledge.
Lekamge, Rashmika, Chitra Jayathilake, and Clayton Smith. 2024. “Language-related barriers and insights 
to overcome the challenges of English medium instructed learning environment for undergraduates.” 
International Journal of Current Education Studies 3 (1): 28–53. https://doi.org/ 10.5281/
zenodo.12193460.
Lekamge, Rashmika, and Jenan Rajavarathan. 2024. “Enhancing academic writing proficiency among 
English as a second language users at the undergraduate level: A comparative analysis of student-
lecturer perspectives and strategies.” Journal of Research and Education 10 (1): 37–76. 
Leung, Lawrence. 2015. “Validity, reliability, and generalizability in qualitative research.” Journal of Family 
Medicine and Primary Care 4 (3): 324. https://doi.org/10.4103/2249-4863.161306.
Lin, Po-Han, Tzu-Chien Liu, and Fred Paas. 2017. “Effects of spell checkers on English as a second 
language students’ incidental spelling learning: A cognitive load perspective.” Reading and Writing 30 
(7): 1501–25. https://doi.org/10.1007/s11145-017-9734-4.
Magaldi, Danielle, and Matthew Berler. 2020. “Semi-structured interviews.” In Encyclopedia of Personality 
and Individual Differences, edited by Virgil Zeigler-Hill and Todd K. Shackelford, 4825–30. Springer 
International Publishing. https://doi.org/10.1007/978-3-319-24612-3_857.
Mahapatra, Santosh. 2024. “Impact of ChatGPT on ESL students’ academic writing skills: A mixed 
methods intervention study.” Smart Learning Environments 11 (1): 9. https://doi.org/10.1186/s40561-
024-00295-9.
Maier, Christian, Jason Bennett Thatcher, Varun Grover, and Yogesh K. Dwivedi. 2023. “Cross-sectional 
research: A critical perspective, use cases, and recommendations for IS research.” International Journal 
of Information Management 70:102625. https://doi.org/10.1016/j.ijinfomgt.2023.102625.
Merzi̇fonluoğlu, Ayşe, and Ayşegül Takkaç Tulgar. 2023. “The effect of technology-supported language 
learning on communication competencies.” Erzincan Üniversitesi Eğitim Fakültesi Dergisi 25 (3): 524–
37. https://doi.org/10.17556/erziefd.1334195.
Morse, Janice M., Michael Barrett, Maria Mayan, Karin Olson, and Jude Spiers. 2002. “Verification 
strategies for establishing reliability and validity in qualitative research.” International Journal of 
Qualitative Methods 1 (2): 13–22. https://doi.org/10.1177/160940690200100202.
Naderifar, Mahin, Hamideh Goli, and Fereshteh Ghaljaie. 2017. “Snowball sampling: A purposeful 
method of sampling in qualitative research.” Strides in Development of Medical Education 14 (3). 
https://doi.org/10.5812/sdme.67670.
89ACADEMIC WRITING
Naeem, Muhammad, Wilson Ozuem, Kerry Howell, and Silvia Ranfagni. 2023. “A step-by-step process of 
thematic analysis to develop a conceptual model in qualitative research.” International Journal of 
Qualitative Methods 22: 16094069231205789. https://doi.org/10.1177/16094069231205789.
Nejja, Mohammed, and Abdellah Yousfi. 2015. “The context in automatic spell correction.” Procedia 
Computer Science 73: 109–14. https://doi.org/10.1016/j.procs.2015.12.055.
Neto, Arthur Flor De Sousa, Byron Leite Dantas Bezerra, and Alejandro Héctor Toselli. 2020. “Towards 
the natural language processing as spelling correction for offline handwritten text recognition 
systems.” Applied Sciences 10 (21): 7711. https://doi.org/10.3390/app10217711.
Omer Ismael, Kozhin, Kochar Ali Saeed, Airin Shwan Ibrahim, and Diya Shawkat Fatah. 2022. “Effects of 
auto-correction on students’ writing skill at three different universities in Sulaimaneyah City.” Arab 
World English Journal 8: 231–45. https://doi.org/10.24093/awej/call8.16.
Pinet, Svetlana, and Nazbanou Nozari. 2022. “Correction without consciousness in complex tasks: 
Evidence from typing.” Journal of Cognition 5 (1): 11. https://doi.org/10.5334/joc.202.
Prasangani, Kariyawasam Sittarage. 2018. “English language education in Sri Lanka Link with the learners’ 
motivational factors.” HLT Magazine, August.
Putze, Felix, Maik Schünemann, Tanja Schultz, and Wolfgang Stuerzlinger. 2017. “Automatic classification 
of auto-correction errors in predictive text entry based on EEG and context information.” In 
Proceedings of the 19th ACM International Conference on Multimodal Interaction, 137–45. Association 
for Computing Machinery. https://doi.org/10.1145/3136755.3136784.
Rahimi, Mehrak, Gholamreza Gholizadeh, and Ali Shahryari. 2019. “Iranian EFL learners’ perceptions 
about automatic spelling correction software use for learning English spellings: A study with focus on 
gender.” International Journal of English Language and Translation Studies 7 (1): 68–75.
Ranalli, Jim, and Taichi Yamashita. 2022. “Automated written corrective feedback: Error correction 
performance and timing of delivery.” Language Learning & Technology 26 (1): 1–25. http://
hdl.handle.net/10125/73465.
Reed, M. S., M. Ferré, J. Martin-Ortega, R. Blanche, R. Lawford-Rolfe, M. Dallimer, and J. Holden. 
2021. “Evaluating impact from research: A methodological framework.” Research Policy 50 (4): 
104147. https://doi.org/10.1016/j.respol.2020.104147.
Ren, Simin, and Paul Seedhouse. 2024. “Doing language testing: Learner-initiated side sequences in a 
technology-mediated language learning environment.” Classroom Discourse 15 (4): 317–52. https://
doi.org/10.1080/19463014.2024.2305446.
Rosairo, H. S. R. 2023. “Thematic analysis in qualitative research.” Journal of Agricultural Sciences – Sri 
Lanka 18 (3). https://doi.org/10.4038/jas.v18i3.10526.
Rüdian, Leo Sylvio, Moritz Dittmeyer, and Niels Pinkwart. 2022. “Challenges of using auto-correction 
tools for language learning.” In LAK22: 12th International Learning Analytics and Knowledge 
Conference, 426–31. Association for Computing Machinery. https://
doi.org/10.1145/3506860.3506867.
Salehi, Mohammad, and Ava Bahrami. 2018. “An error analysis of journal papers written by Persian authors.” 
Cogent Arts & Humanities 5 (1): 1537948. https://doi.org/10.1080/23311983.2018.1537948.
Salton, G., and M. E. Lesk. 1968. “Computer evaluation of indexing and text processing.” Journal of the 
ACM 15 (1): 8–36. https://doi.org/10.1145/321439.321441. 
Sanchez, Avigail T., Kairille France R. Arcila, Jerime L. Baldomero, Kianna Mhae P. Cahanding, Rachelle 
Anne A. De Leon, and Catherine A. Samson. 2023. “Roles of auto-correction tools on HUMSS 
students’ writing skills.” Proceedings of International Interdisciplinary Conference on Sustainable 
Developments Goals (IICSDGs) 6 (1): 99–113.
Sanosi, Abdulaziz B. 2022. “The impact of automated written corrective feedback on EFL learners’ 
academic writing accuracy.” Journal of Teaching English for Specific and Academic Purposes 10 (2): 301–
17. https://doi.org/10.22190/JTESAP2202301S.
Sari, Elif, and Turgay Han. 2024. “The impact of automated writing evaluation on English as a foreign 
language learners’ writing self‐efficacy, self‐regulation, anxiety, and performance.” Journal of Computer 
Assisted Learning 40 (5): 2065–80. https://doi.org/10.1111/jcal.13004.
90 Rashmika Lekamge, Clayton Smith Impact of Auto-Correction Features  in Text-Processing Software on the Academic ...
Saud, Jefriyanto, Lela Susanty, Petrus Jacob Pattiasina, Satriani, and Wajnah. 2023. “Exploring the 
influence of the environment on students’ second language acquisition: A comprehensive 
psycholinguistic study.” RETORIKA: Jurnal Ilmu Bahasa 9 (2): 174–84. https://
doi.org/10.55637/jr.9.2.7724.174-184.
Schaefer, Kristin E., Jessie Y. C. Chen, James L. Szalma, and P. A. Hancock. 2016. “A meta-analysis of 
factors influencing the development of trust in automation: Implications for understanding autonomy 
in future systems.” Human Factors: The Journal of the Human Factors and Ergonomics Society 58 (3): 
377–400. https://doi.org/10.1177/0018720816634228.
Shadiev, Rustam, and Xun Wang. 2022. “A review of research on technology-supported language learning 
and 21st century skills.” Frontiers in Psychology 13: 897689. https://
doi.org/10.3389/fpsyg.2022.897689.
Sherafati, Narjis, Farzad Mahmoudi Largani, and Shahrzad Amini. 2020. “Exploring the effect of 
computer-mediated teacher feedback on the writing achievement of Iranian EFL learners: Does 
motivation count?” Education and Information Technologies 25 (5): 4591–4613. https://
doi.org/10.1007/s10639-020-10177-5.
Singh, Jitendra, and Tracy Eisenschenk. 2021. “A thematic analysis of the attitudes and perceptions of 
faculty towards inclusion of interprofessional education in healthcare curriculum.” International 
Journal of Health Sciences Education 8 (1). https://doi.org/10.59942/2325-9981.1117.
Singh, Shashank, and Shailendra Singh. 2018. “Review of real-word error detection and correction 
methods in text documents.” In 2018 Second International Conference on Electronics, Communication 
and Aerospace Technology (ICECA), 1076–81. IEEE. https://doi.org/10.1109/ICECA.2018.8474700.
Stamatatos, Efstathios, Nikos Fakotakis, and George Kokkinakis. 2000. “Automatic text categorization in 
terms of genre and author.” Computational Linguistics 26 (4): 471–95. https://
doi.org/10.1162/089120100750105920.
Steyn, Jacques, and Graeme Johanson. 2011. ICTs and Sustainable Solutions for the Digital Divide: Theory 
and Perspectives. Information Science Reference.
Tica, Lena, and Ivana Krsmanović. 2024. “Overcoming the writer’s block? Exploring students’ motivation 
and perspectives on using ChatGPT as a writing assistance tool in ESP.” ELOPE: English Language 
Overseas Perspectives and Enquiries 21 (1): 129–49. https://doi.org/10.4312/elope.21.1.129-149.
Toppelberg, Claudio O., and Brian A. Collins. 2010. “Language, culture, and adaptation in immigrant 
children.” Child and Adolescent Psychiatric Clinics of North America 19 (4): 697–717. https://
doi.org/10.1016/j.chc.2010.07.003.
Truscott, John. 1999. “The case for ‘the case against grammar correction in L2 writing classes’: A response 
to Ferris.” Journal of Second Language Writing 8 (2): 111–22. https://doi.org/10.1016/S1060-
3743(99)80124-6.
Van der Loo, Mark P. J., and Edwin de Jonge. 2020. “Data validation.” In Wiley StatsRef Statistics Reference 
Online. https://doi.org/10.1002/9781118445112.stat08255.
Van Der Steen, Steffie, Dianne Samuelson, and Jennifer M. Thomson. 2017. “The effect of keyboard-based 
word processing on students with different working memory capacity during the process of academic 
writing.” Written Communication 34 (3): 280–305. https://doi.org/10.1177/0741088317714232.
Van Waes, Luuk, Mariëlle Leijten, Jens Roeser, Thierry Olive, and Joachim Grabowski. 2021. “Measuring 
and assessing typing skills in writing research.” Journal of Writing Research 13 (1): 107–53. https://
doi.org/10.17239/jowr-2021.13.01.04.
Vičič, Polona. 2020. “A fully integrated approach to blended language learning.” ELOPE: English Language 
Overseas Perspectives and Enquiries 17 (2): 219–38. https://doi.org/10.4312/elope.17.2.219-238.
Vivek, Ramakrishnan, Yogarajah Nanthagopan, and Sarmatha Piriyatharshan. 2023. “Beyond methods: 
Theoretical underpinnings of triangulation in qualitative and multi-method studies.” SEEU Review 18 
(2): 105–22. https://doi.org/10.2478/seeur-2023-0088.
91ACADEMIC WRITING
Vodopija-Krstanović, Irena, and Maja Brala Vukanović. 2015. “Students of today changing English 
language studies of yesterday.” ELOPE: English Language Overseas Perspectives and Enquiries 12 (2): 
175–89. https://doi.org/10.4312/elope.12.2.175-189.
Wei, Ping, Xiaosai Wang, and Hui Dong. 2023. “The impact of automated writing evaluation on second 
language writing skills of Chinese EFL learners: A randomized controlled trial.” Frontiers in Psychology 
14: 1249991. https://doi.org/10.3389/fpsyg.2023.1249991.
Weigle, Sara Cushing. 2013. “English language learners and automated scoring of essays: Critical 
considerations.” Assessing Writing 18 (1): 85–99. https://doi.org/10.1016/j.asw.2012.10.006.
Wood, Nicola. 2014. “Autocorrect awareness: Categorizing autocorrect changes and measuring authorial 
perceptions.” MA Thesis, Florida State University. 
Wu, Yanni, Michelle Howarth, Chunlan Zhou, Mingyu Hu, and Weilian Cong. 2019. “Reporting of 
ethical approval and informed consent in clinical research published in leading nursing journals: A 
retrospective observational study.” BMC Medical Ethics 20 (1): 94. https://doi.org/10.1186/s12910-
019-0431-5.

93ACADEMIC WRITING
2025, Vol. 22 (1), 93-109(228)
journals.uni-lj.si/elope
https://doi.org/10.4312/elope.22.1.93-109
UDC: [811.111’243:378(594)]:004.89
Tommy Hastomo, Andini Septama Sari, 
Utami Widiati, Francisca Maria Ivone, 
Evynurul Laily Zen
State University of Malang, Indonesia
Muhammad Fikri Nugraha Kholid 
Raden Intan State Islamic University Lampung, Indonesia
Does Student Engagement with Chatbots Enhance 
English Proficiency?
ABSTRACT
This study examines how Indonesian university students’ engagement with chatbots influences 
their English proficiency. While AI tools are increasingly used in language education, little 
research focuses on chatbot interaction dynamics. The research assesses behavioural (active use), 
cognitive (perceived value), and emotional (attitudinal) engagement across 150 non-English 
majors at four proficiency levels (A1–B2). Data from engagement surveys and proficiency 
tests were analysed using ANOVA, correlation, and regression. Results indicated that higher-
proficiency students (B1/B2) engaged more intensely with chatbots than their lower-level 
peers. Behavioural and cognitive engagement strongly correlated with improved language 
skills, while emotional engagement showed no significant link. Regression analysis identified 
behavioural and cognitive engagement as key predictors of proficiency gains, suggesting that 
active interaction and perceived utility of chatbots drive language development. The findings 
underscore chatbots’ potential as effective language-learning aids.
Keywords: AI, chatbots, cognitive engagement, emotional engagement, English proficiency, 
Indonesian university students
Ali uporaba pogovornih sistemov prispeva k izboljšanju znanja 
angleščine pri študentih in študentkah?
IZVLEČEK
Študija preučuje vpliv uporabe pogovornih sistemov na znanje angleščine pri indonezijskih 
študentih in študentkah. Čeprav se ta orodja vse pogosteje uporabljajo pri učenju jezikov, je 
raziskav o njihovi interakciji malo. Raziskava zajema vedenjsko (aktivna uporaba), kognitivno 
(zaznana koristnost) in čustveno (odnosno) vključenost pri 150 sodelujočih, ki ne študirajo 
angleščine, na štirih ravneh znanja jezika (A1–B2). Podatki iz anket in testov znanja so 
analizirani s pomočjo ANOVA ter korelacijske in regresijske analize. Rezultati so pokazali, 
da študenti in študentke z višjo ravnjo znanja (B1/B2) pogosteje in intenzivneje uporabljajo 
pogovorne sisteme. Vedenjska in kognitivna vključenost močno korelirata z izboljšanjem 
jezikovnih spretnosti, medtem ko čustvena vključenost nima pomembnega vpliva. Regresijska 
analiza je pokazala, da sta vedenjska in kognitivna vključenost ključna napovednika napredka 
v znanju angleščine, kar kaže, da sta aktivna uporaba in zaznana koristnost pogovornih 
sistemov glavna dejavnika pri učenju jezika. Ugotovitve potrjujejo potencial pogovornih 
sistemov kot učinkovitih učnih pripomočkov.
Ključne besede: umetna inteligenca, pogovorni sistemi, kognitivna vključenost, čustvena 
vključenost, znanje angleškega jezika, indonezijski študenti
94 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ...
1 Introduction
In Indonesia, English proficiency has become increasingly important as the country continues 
to engage with the global economy and participate in international collaborations. As global 
networks expand and the demand for skilled professionals rises, English becomes a key tool 
for communication and knowledge exchange (Ward and Given 2019). Universities and higher 
education institutions in Indonesia face the challenge of ensuring that their graduates are 
competitive on the world stage, which requires a strong command of the English language 
(Logli 2016). Graduates with advanced English proficiency are better positioned to secure 
employment, collaborate on international projects, and participate in global research initiatives. 
However, the level of English proficiency among Indonesian students often falls short of 
expectations, creating a significant gap in their ability to succeed in these global contexts. 
Consequently, developing practical educational tools and strategies to improve English 
proficiency has become a pressing priority for educators and policymakers in Indonesia.
One innovative solution that has garnered attention is using Artificial Intelligence (AI)-
powered chatbots to support language learning. These chatbots, engineered to replicate 
human-like dialogue, provide students with a dynamic and tailored educational experience 
that can be adjusted to individual requirements (Waziana et al. 2024). Chatbots can provide 
real-time feedback, guide students through language exercises, and offer practice in a safe, 
low-pressure environment. This feature proves especially advantageous for individuals who 
experience apprehension or reluctance when speaking English in group settings, as these 
tools create a supportive environment conducive to language practice. Current research 
highlights that AI-based conversational agents foster higher levels of student engagement and 
motivation by delivering adaptable, interactive learning experiences beyond those achievable 
in standard teacher-led environments (Alsawaier 2018; Huang, Hew, and Fryer 2022). 
Additionally, these chatbots can help students practice language skills at their own pace, 
promoting self-regulated learning and fostering deeper language acquisition (Chang et al. 
2023). By providing immediate feedback, these tools support the iterative process of learning, 
which is essential for mastering a language.
Nevertheless, while AI-powered conversational agents are increasingly integrated into 
educational frameworks, empirical investigation into their targeted effects on English 
language proficiency development remains notably limited. Although current research 
has investigated the broader pedagogical advantages of AI-driven technologies in learning 
environments (Slamet 2024; Waziana et al. 2024; Gayed et al. 2022; Nurchurifiani et al. 
2025; Zulianti et al. 2024), very few empirical studies have focused on the correlation 
between learner interaction with AI conversational systems and measurable advancements 
in English linguistic proficiency. The effectiveness of these tools in enhancing learners’ 
English proficiency, especially in comparison to traditional methods of language instruction, 
remains an area that requires further investigation. Scholarly inquiries have examined the 
pedagogical applications of chatbots in stimulating learner motivation and enriching 
educational outcomes (Kim, Cha, and Kim 2021; Silitonga et al. 2023), but there is a gap in 
understanding how sustained engagement with chatbots leads to measurable improvement 
in language proficiency. Further empirical investigation is warranted to determine the efficacy 
T. Hastomo, A. S. Sari, 
U. Widiati, F.M. Ivone, 
E. Laily Zen, M. F. N. Kholid
Does Student 
Engagement with 
Chatbots ...
95ACADEMIC WRITING
of chatbots in facilitating linguistic competence, especially in the context of non-English-
speaking countries like Indonesia.
Furthermore, the methodological frameworks and integration strategies underpinning 
AI chatbot interfaces within language acquisition pedagogy continue to undergo iterative 
refinement. While some chatbots have been designed to focus on speaking and writing 
practice, others offer more general support for reading and listening skills. The diversity of 
AI tools available on the market presents a challenge for educators and researchers trying to 
pinpoint which features and functions of these tools are most beneficial for improving language 
proficiency. Studies suggest that chatbots with personalized feedback mechanisms, goal-
setting features, and adaptive learning paths are more likely to enhance student engagement 
and language development (Huang, Hew, and Fryer 2022; Chang et al. 2023). These findings 
underscore the necessity for specialized scholarly investigation into strategies for optimizing 
chatbots to facilitate discrete components of linguistic acquisition, including but not limited 
to syntactic accuracy, lexical expansion, phonological precision, and communicative fluency.
Despite these promising developments, challenges remain in integrating chatbots into 
mainstream language education. Scholarly discourse has raised concerns that excessive 
dependence on technological systems within educational settings may inadvertently diminish 
the frequency and quality of direct interpersonal engagement among educators and fellow 
learners (Zou et al. 2023; Hastomo, Mandasari, and Widiati 2024). Moreover, while chatbots 
offer substantial utility in facilitating language practice, they are inherently limited in replicating 
the nuanced dynamics and depth inherent to human communicative exchanges. Consequently, 
educators retain an indispensable role in scaffolding learners’ linguistic development, despite 
the proliferation of sophisticated artificial intelligence applications. This necessitates a paradigm 
shift towards strategically incorporating these technologies into pedagogical frameworks to 
augment rather than supplant conventional instructional approaches.
This research aims to address the gap in understanding the role of chatbots in enhancing 
English proficiency by investigating the relationship between Indonesian university students’ 
engagement with AI-powered chatbots and improvement in their English language skills. By 
focusing on Indonesian university students in an EFL context, this study seeks to contribute 
valuable insights into the potential of AI tools to address the English proficiency gap in 
Indonesia. Specifically, the study will examine how students’ engagement with chatbots 
correlates with improved English proficiency. By addressing these objectives, the research aims 
to advance scholarly discourse on AI’s transformative potential within educational paradigms, 
particularly its capacity to redefine language acquisition methodologies in digitally mediated 
environments. The investigation is structured around the following research questions:
1. How does the engagement of Indonesian university students with chatbots differ by 
English proficiency level?
2. How does the engagement of Indonesian university students with chatbots correlate 
with their English proficiency?
3. What predictive role does the engagement of Indonesian university students with 
chatbots play in their English proficiency?
96 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ...
2 Literature Review
2.1 Student Engagement in Language Learning
Student engagement refers to students’ cognitive, emotional, and behavioural involvement 
in their learning activities. Cognitive engagement involves the mental effort students apply 
to understanding and integrating new information, while emotional engagement reflects 
students’ feelings about, and attitudes and motivations towards learning. Behavioural 
engagement is observable through active participation and consistent task effort (Al-Obaydi et 
al. 2023). Engagement in language acquisition processes is pivotal, as it facilitates knowledge 
retention, cultivates analytical reasoning, and strengthens problem-resolution capabilities 
among learners. The more engaged students are, the better they can grasp complex language 
concepts, such as grammar, vocabulary, and pronunciation. Active participation in classroom 
activities, whether speaking, listening, reading, or writing, fosters deeper learning. Moreover, 
engaged students tend to have a positive attitude toward language learning, which increases 
their persistence and resilience in overcoming challenges. Emotional engagement also affects 
students’ motivation, creating a sense of belonging and interest in the subject. Therefore, 
language educators must design engaging lessons that stimulate students’ intellectual curiosity 
and emotional connection to the language. By cultivating learner engagement, educators can 
establish an interactive educational setting that encourages proactive involvement, enabling 
students to collaboratively shape and invest in their academic development.
Research indicates that increased student engagement leads to better language learning 
outcomes. For instance, when students actively participate in discussions, role-playing 
activities, or group projects, they are more likely to develop stronger communication skills 
and language fluency (Hastomo et al. 2024). Engagement enhances academic performance 
and students’ ability to interact effectively in real-life situations, making it essential for second 
language acquisition. Additionally, when students are emotionally engaged in the subject, 
they pursue language learning beyond the classroom, thus improving their overall proficiency. 
Behavioural engagement, such as practicing language skills outside class or seeking feedback, 
also supports ongoing improvement in language proficiency. Studies have shown that engaged 
students are more willing to use language learning tools and participate in extracurricular 
language-related activities, such as clubs or online forums. Consequently, the role of student 
engagement in language learning cannot be overstated, as it directly correlates with academic 
success and language mastery. Language teachers who integrate strategies to foster engagement, 
such as interactive activities and personalized learning experiences, contribute significantly to 
their students’ development (Moreira et al. 2018). Engaging students through diverse activities 
enhances their motivation and language acquisition process, and thus improving proficiency.
One effective way to increase student engagement in language learning is by integrating 
technology, such as AI tools. These tools offer customizable instructional trajectories, thereby 
accommodating learners’ capacity to autonomously modulate their progression rates in 
alignment with current competency thresholds (Oktarin et al. 2024). These tools can provide 
immediate feedback and guidance, enhancing students’ cognitive engagement. Gamification 
and interactive features facilitate emotional engagement, which makes language learning 
97ACADEMIC WRITING
more enjoyable. Additionally, chatbots can adapt instructional materials in alignment with 
learner performance metrics, thereby sustaining an equilibrium between cognitive rigor and 
developmental feasibility. This adaptability encourages continuous engagement, as students 
can progress without feeling overwhelmed or bored. Thus, integrating AI technology, such as 
chatbots, into language learning offers exciting opportunities to improve student engagement. 
It provides a unique solution to cater for diverse learning styles and abilities, ensuring all 
students have the necessary resources and support to succeed.
2.2 Chatbots as a Language Learning Tool
AI chatbots have become a prominent tool for language learning because of their accessibility, 
flexibility, and personalization capabilities. These tools serve as accessible, interactive modalities 
for autonomous language skill development beyond formal instructional environments. 
Contemporary platforms such as ChatGPT, Gemini, and Perplexity scaffold adaptive 
pedagogical frameworks, empowering learners to refine lexical acquisition, syntactic mastery, 
and compositional fluency through a self-regulated pace of progression (Waziana et al. 2024). 
These tools allow students to receive immediate feedback on their language production, which 
helps improve their writing and speaking proficiency. Chatbots are available 24/7, providing 
learners with consistent opportunities to practice without time or location limitations. This 
round-the-clock availability encourages continuous learning, making language practice more 
integrated into students’ daily lives. Additionally, chatbots support self-regulated learning 
by offering students a structured yet flexible learning environment. Through task-based 
activities, students can engage in meaningful conversations or writing exercises that align 
with their learning needs and goals. Therefore, chatbots contribute to an enhanced learning 
experience by offering a combination of accessibility, flexibility, and personalized feedback 
that supports language acquisition.
One significant advantage of chatbots is their ability to personalize learning experiences. These 
tools can tailor exercises and tasks to suit individual learners’ proficiency levels and learning 
styles. For example, Duolingo adjusts the difficulty of exercises based on how well a student 
performs, ensuring that the learning experience remains challenging but not overwhelming 
(Sari, Hastomo, and Nurchurifiani 2023). ChatGPT, on the other hand, allows users to ask 
questions or engage in conversations in English, providing real-time, context-aware responses 
that help improve language skills (Slamet 2024). These personalized features increase students’ 
motivation by ensuring they can practice at a level appropriate for their abilities. Moreover, 
chatbots give students a sense of autonomy, as they can choose when and how to engage 
with the tool (Shikun et al. 2024). This autonomy enhances their emotional engagement by 
cultivating a perception of control over their educational trajectory. By providing immediate, 
personalized feedback, these tools also foster cognitive engagement, encouraging students 
to reflect on their mistakes and improve their language proficiency over time. As a result, 
chatbots are valuable tools that can enhance language learning by providing flexibility, 
personalization, and immediate feedback.
Despite the advantages of AI chatbots, challenges remain regarding their integration into 
language learning. Studies have raised concerns about the limitations of AI in understanding 
98 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ...
complex human emotions or providing nuanced feedback (Casal and Kessler 2023; 
Rudolph, Tan, and Tan 2023; Thorp 2023; Baskara 2023). While AI can assist in grammar 
and vocabulary exercises, it may struggle with understanding the subtleties of conversational 
language, such as tone or cultural context. Furthermore, chatbots are not a replacement for 
human interaction, which is essential for developing communicative competence in language 
learning. Therefore, while AI tools like chatbots offer valuable assistance in practice, they 
should be used alongside traditional language instruction to provide a balanced learning 
experience. Additionally, learners may experience frustration if they encounter AI limitations, 
thus affecting their motivation and engagement. Educators must ensure that students are 
aware of these limitations and guide them in using chatbots effectively to complement 
their language learning. Therefore, chatbots have emerged as a transformative innovation 
in language pedagogy, delivering customized, readily accessible, and adaptable educational 
frameworks that substantially enhance learners’ linguistic competences.
2.3 Student Engagement and English Proficiency
The Common European Framework of Reference for Languages (CEFR) is widely recognized 
as the principal standard for evaluating English language proficiency, stratifying linguistic 
ability into six sequential tiers: A1 (Basic User), A2 (Elementary), B1 (Intermediate), B2 
(Upper Intermediate), C1 (Advanced), and C2 (Mastery). The CEFR tiers serve as indicators 
of learners’ receptive and productive competences in English, spanning from foundational 
to advanced mastery (Kim 2021). The A1 tier denotes foundational linguistic capabilities, 
whereas C2 approximates native-like mastery in both fluency and accuracy. By offering a 
systematic methodology for assessing communicative abilities, the CEFR enables instructors 
to pinpoint developmental needs and design targeted pedagogical interventions aligned 
with learners’ proficiency trajectories. The link between student engagement and English 
proficiency is well-documented, as engaged students are more likely to advance through 
these levels (Karabiyik 2019). When cognitively engaged in challenging tasks, students 
develop a deeper understanding and mastery of the language. Emotional engagement, 
fuelled by interest and motivation, encourages students to persist through complex tasks, 
improving their proficiency over time. Behavioural engagement, such as practicing English 
outside the classroom, further reinforces this development. Empirical research underscores 
a significant correlation between learner agency in educational processes and enhanced 
linguistic competence, as sustained cognitive engagement and deliberate practice catalyse the 
assimilation of target language structures (Hastomo and Septiyana 2022).
Engagement in learning activities significantly contributes to language proficiency, especially 
when students actively participate in meaningful tasks. For example, students who engage 
in regular writing practice or group discussion tend to improve their communication skills 
and achieve higher proficiency levels (Yu, Jiang, and Zhou 2020). Task-based learning, where 
students are encouraged to use language in practical contexts, fosters deeper engagement and 
accelerates language acquisition. When students engage with authentic materials, such as 
news articles, movies, or social media content, they are exposed to real-world language use, 
contributing to more natural and functional language proficiency. Additionally, interaction 
with native speakers or engagement with AI-driven platforms such as chatbots offer learners 
99ACADEMIC WRITING
meaningful avenues to refine their oral and written communication skills through structured 
practice and contextual feedback. The more frequently students engage in these activities, 
the more likely they are to progress through the CEFR levels. Thus, student engagement is 
essential for improving language skills and achieving higher proficiency levels as measured by 
the CEFR framework.
Research has also shown that engagement can influence the speed at which students progress 
through the CEFR levels. Research indicates that learners exhibiting consistent engagement 
and active participation in language acquisition demonstrate accelerated progression through 
proficiency tiers compared to peers with limited involvement (Tian and Zhou 2020; Shen 
et al. 2023). This is particularly evident in language proficiency exams, where students with 
higher engagement levels often score better. Additionally, integrating technology, such as 
chatbots, can further enhance student engagement and accelerate their progression through 
the CEFR levels. By offering personalized practice opportunities and immediate feedback, 
AI tools can help students improve their language skills more efficiently. Therefore, fostering 
student engagement is key to ensuring progress and success in language learning, as it directly 
influences the development of English proficiency across various CEFR levels.
3 Research Methodology
This study aimed to examine and quantify the correlations between Indonesian university 
students’ engagement with chatbots and their English proficiency, using a quantitative 
research design. The research design is illustrated in Figure 1.
where students with higher engagement levels often score better. Additionally, integrating technology, such as 
chatbots, can further enhance student engagement and accelerate their progression through the CEFR levels. By 
offering personalized practice opportunities and immediate feedback, AI tools can help students improve their 
language skills more efficiently. Therefore, fostering student engagement is key to ensuring progress and success in 
language learning, as it directly influences the development of English proficiency across various CEFR levels. 
 
3 Research Methodology 
This study aimed to examine and quantify the correlations between Indonesian university students’ engagement with 
chatbots and their English proficiency, using a quantitative research design. The research design is illustrated in 
Figure 1. 
 
 
 
 
 
 
 
FIGURE 1. Research design. 
 
3.1 Participants 
The participants in this study are university students from three public universities in Lampung (Sumatra), Malang 
(Java), and Pontianak (Kalimantan), representing three major islands in Indonesia. A total of 150 students from these 
institutions were selected for the study, all of whom met the inclusion criteria: they are undergraduate students who 
do not major in English and have varying levels of English proficiency as defined by the CEFR. The participants are 
categorized into four proficiency levels: A1 (beginner), A2 (elementary), B1 (intermediate), and B2 (upper-
intermediate). Specifically, 72 students are at the A1 level (48.0%), 50 students at the A2 level (33.3%), 22 students 
at the B1 level (14.7%), and six students at the B2 level (4.0%). 
Demographically, the sample reflects regional diversity: Lampung represents a semi-urban area in southern Sumatra 
with moderate IT infrastructure; Malang is an urban educational hub in East Java with relatively advanced 
technological accessibility; and Pontianak is a city in West Kalimantan with developing IT resources. Including 
students across these geographically and socioeconomically distinct regions enables the study to account for 
variability in technological access, a critical factor in AI chatbot adoption. 
The students, being non-English majors, were chosen because their interaction with English was typically less 
frequent, providing valuable insight into how chatbots might influence students who are not primarily focused on 
language learning. By selecting this diverse sample spanning multiple Indonesian islands, the study aims to explore 
how engagement with chatbots can impact language development in a non-intensive English learning environment, 
while acknowledging contextual limitations in technological accessibility. 
3.2 Instruments 
B1 
Students’ engagement 
with AI chatbots 
English Proficiency  
Differences by 
English Proficiency 
B2 
A
A
Figure 1. Research design.
3.1 Participants
The participants in this study are university students from three public universities in 
Lampung (Sumatra), Malang (Java), and Pontianak (Kalimantan), representing three major 
islands in Indonesia. A total of 150 students from these institutions were selected for the 
study, all of whom met the inclusion criteria: they are undergraduate students who do not 
major in English and have varying levels of English proficiency as defined by the CEFR. The 
participants are categorized into four proficiency levels: A1 (beginner), A2 (elementary), B1 
(int rmediate), and B2 (upp r-intermediate). Specifically, 72 students are at the A1 level 
(48.0%), 50 students at the A2 level (33.3%), 22 students at the B1 level (14.7%), and six 
students at the B2 level (4.0%).
100 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ...
Demographically, the sample reflects regional diversity: Lampung represents a semi-urban area 
in southern Sumatra with moderate IT infrastructure; Malang is an urban educational hub in 
East Java with relatively advanced technological accessibility; and Pontianak is a city in West 
Kalimantan with developing IT resources. Including students across these geographically and 
socioeconomically distinct regions enables the study to account for variability in technological 
access, a critical factor in AI chatbot adoption.
The students, being non-English majors, were chosen because their interaction with English 
was typically less frequent, providing valuable insight into how chatbots might influence 
students who are not primarily focused on language learning. By selecting this diverse sample 
spanning multiple Indonesian islands, the study aims to explore how engagement with 
chatbots can impact language development in a non-intensive English learning environment, 
while acknowledging contextual limitations in technological accessibility.
3.2 Instruments
This study utilized two primary instruments for data collection: the Engagement 
Questionnaire and the English Proficiency Test. The Engagement Questionnaire was adapted 
from Xu and Li (2024) and aimed to evaluate the degree of student engagement in using 
chatbots for language learning, focusing on three critical aspects: behavioural, cognitive, and 
emotional engagement. Behavioural engagement was assessed by measuring the frequency 
and duration of chatbot usage, while cognitive engagement explored student perceptions of 
the chatbot’s usefulness in enhancing their learning process. Emotional engagement measured 
student motivation and satisfaction while interacting with the chatbot. The instrument 
utilized a 5-point Likert scale, with responses spanning from never (1) to always (5), allowing 
participants to systematically and measurably self-assess their engagement levels.
The behavioural engagement section included questions such as, “How often do you use 
the AI chatbot to practice English?” and “How much time do you typically spend using 
the chatbot each week?” Cognitive engagement was measured through items like, “Do 
you find the chatbot helpful in improving your English skills?” and “Do you actively think 
about how to use the language while interacting with the chatbot?” Emotional engagement 
was gauged through prompts such as, “Using the chatbot motivates me to learn English” 
and “I feel satisfied after using the chatbot for language practice.” This multi-dimensional 
approach provided a comprehensive understanding of student engagement with the AI tool 
by capturing perspectives from different facets of interaction.
The English Proficiency Test was designed to assess participants’ current proficiency levels based 
on the CEFR framework, covering four key language skills. Administered before the engagement 
questionnaire, this test established the baseline English proficiency of the participants. The 
results categorized students into A1, A2, B1, or B2 proficiency levels. The test ensured a holistic 
evaluation of each participant’s language ability by addressing all four language skills.
3.3 Data Collection
The data collection process was carried out in two main phases: administering the English 
Proficiency Test and completing the Engagement Questionnaire. The English Proficiency Test 
101ACADEMIC WRITING
was conducted first to determine the participant’s current level of language proficiency. This 
test was distributed to all participants online, ensuring students could complete it quickly. The 
test was timed, and students were given a fixed duration to complete all sections. The results of 
this test were used to classify participants into one of the four CEFR levels (A1, A2, B1, or B2), 
which were crucial for analysing engagement differences based on proficiency level.
Following the proficiency assessment, the Engagement Questionnaire was administered 
electronically via Google Forms to the same cohort. The instrument was structured to 
prioritize user-friendly design and accessibility, facilitating efficient digital completion. 
Participants were instructed to respond to items reflecting their recent utilization of chatbots 
for language acquisition purposes. The anonymous questionnaire encouraged honest 
responses and reduced social desirability bias. The data collected from the questionnaire 
provided insights into how students engage with chatbots and how their engagement levels 
relate to their language proficiency.
Ethical protocols were rigorously integrated throughout the study’s methodological procedures. 
Written informed consent was secured from all participants prior to their commencement 
of the proficiency assessment and engagement survey. The consent documentation explicitly 
detailed the research objectives, participants’ right to withdraw without penalty, and 
assurances regarding the anonymity and secure handling of all collected data. The data was 
stored securely and used exclusively for research purposes. Personal information was kept 
anonymous to ensure the privacy of the participants.
3.4 Data Analysis
This study employed quantitative analytical techniques, encompassing both descriptive and 
inferential approaches, to investigate potential correlations between learner engagement 
and English proficiency outcomes. Descriptive analyses were conducted to systematically 
summarize the dataset derived from the Engagement Questionnaire, with central tendency 
(mean) and dispersion (standard deviation) metrics calculated for each engagement dimension 
– behavioural, cognitive, and emotional – across the four CEFR proficiency tiers (A1–B2). 
These computations facilitated a comparative evaluation of aggregate engagement tendencies 
and variability patterns among learner subgroups. Specifically, the mean values elucidated 
average engagement intensity per proficiency tier, while standard deviation measures revealed 
intra-group heterogeneity in engagement patterns, thereby contextualizing the uniformity or 
divergence of learner experience within each cohort.
A one-way analysis of variance (ANOVA) was performed to investigate statistically 
significant variations in engagement levels across distinct proficiency tiers. This statistical 
test compared the means of engagement scores (behavioural, cognitive, and emotional) 
across the four proficiency groups (A1, A2, B1, and B2). A significant result from the 
ANOVA would indicate that engagement patterns differ significantly between students 
at different proficiency levels. This analysis helped to determine if students at higher 
proficiency levels (B1, B2) were more or less engaged with chatbots compared to students 
at lower proficiency levels (A1, A2).
102 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ...
A Pearson correlation coefficient analysis was conducted to assess the association between 
composite engagement metrics (behavioural, cognitive, emotional) and English proficiency 
scores. This analysis quantified both the magnitude and directional tendency of the 
relationship, with positive coefficients denoting proportional alignment between elevated 
engagement and higher proficiency outcomes, while negative values signified an inverse 
association. The results elucidated the extent to which engagement variables collectively 
influenced linguistic competency development.
A multiple linear regression analysis was conducted to evaluate the predictive impact of 
learner engagement dimensions on English linguistic competency. Within this statistical 
framework, behavioural, cognitive, and emotional engagement metrics were operationalized 
as independent variables, while proficiency test results served as the dependent outcome 
measure. This analysis allowed for identifying the specific aspects of engagement that most 
strongly predict students’ language proficiency. By considering all three types of engagement 
simultaneously, the regression analysis provided a comprehensive understanding of how each 
form of engagement influences language proficiency.
4 Research Results
4.1 The Engagement of Indonesian University Students with Chatbots
The analysis of Indonesian university students’ engagement with AI chatbots across English 
proficiency levels (A1, A2, B1, and B2) revealed notable patterns, as summarized in Table 
1. Analytical findings revealed that learners across all CEFR tiers exhibited intermediate 
engagement levels with AI-driven conversational interfaces within language acquisition 
contexts.
Table 1. Engagement with Chatbots across English proficiency levels.
English Proficiency Level Mean SD Level of Engagement
A1 3.21 0.45 Moderate
A2 3.32 0.43 Moderate
B1 3.45 0.40 Moderate
B2 3.63 0.38 Moderate
A1 3.21 0.45 Moderate
When analysing specific dimensions of engagement, students at the A1 and A2 levels reported 
similar patterns of moderate behavioural and cognitive engagement, as shown in Table 2. 
Emotional engagement was reported as slightly lower for these levels than other dimensions, 
reflecting challenges in maintaining motivation and satisfaction during chatbot interactions.
As illustrated in Table 3, learners at the B1 and B2 proficiency levels demonstrated elevated 
cognitive and behavioural engagement relative to their A1 and A2 counterparts, reflecting 
more profound immersion in and valuation of AI chatbot pedagogical interventions. 
Emotional engagement also improved, reflecting increased motivation and satisfaction 
among students with higher proficiency levels.
103ACADEMIC WRITING
Table 2. Engagement dimensions for A1 and A2 students.
Engagement 
Dimensions
A1 
Mean
A1 
SD
Level of 
Engagement
A2 
Mean
A2 
SD
Level of 
Engagement
Engagement 
Dimensions
Behavioural 
Engagement 3.28 0.47 Moderate 3.35 0.45 Moderate
Behavioural 
Engagement
Cognitive 
Engagement 3.15 0.50 Moderate 3.27 0.48 Moderate
Cognitive 
Engagement
Emotional 
Engagement 3.02 0.53 Moderate 3.13 0.51 Moderate
Emotional 
Engagement
Table 3. Engagement dimensions for B1 and B2 students.
Engagement 
Dimensions
B1 
Mean
B1 
SD
Level of 
Engagement
B2 
Mean
B2 
SD
Level of 
Engagement
Engagement 
Dimensions
Behavioural 
Engagement 3.52 0.44 Moderate 3.68 0.41 Moderate
Behavioural 
Engagement
Cognitive 
Engagement 3.46 0.46 Moderate 3.60 0.43 Moderate
Cognitive 
Engagement
Emotional 
Engagement 3.32 0.50 Moderate 3.47 0.48 Moderate
Emotional 
Engagement
The engagement of Indonesian university students with chatbots varies across English 
proficiency levels. The One-Way ANOVA results revealed significant differences in overall 
engagement levels among the four proficiency groups (F(3, 706) = 6.15, p < 0.001). Post Hoc 
Tukey HSD tests showed that B2 students demonstrated significantly higher engagement 
levels than A1 (p = 0.001) and A2 (p = 0.005) students. Additionally, B1 students displayed 
higher engagement than A1 students (p = 0.028). These results suggest that students with 
greater English proficiency tend to engage more actively with chatbots, particularly in the 
cognitive and behavioural dimensions. Emotional engagement also improved incrementally 
with proficiency, highlighting the importance of personalized experiences in fostering 
satisfaction and motivation.
4.2 Correlation Between Indonesian University Students’ Engagement 
with Chatbots and English Proficiency
The analysis results revealed significant and non-significant correlations between Indonesian 
university students’ engagement with AI chatbots and their English proficiency. The overall 
engagement score did not show a substantial relationship with English proficiency. However, 
when specific dimensions of engagement were examined, positive significant correlations were 
found between behavioural and cognitive engagement and English proficiency. Conversely, 
emotional engagement exhibited no significant correlation with English proficiency.
104 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ...
Table 4. Results of Pearson’s correlation analyses.
Behavioural 
Engagement
Cognitive 
Engagement
Emotional 
Engagement
Overall 
English Proficiency r .142** .128** .046
p .001 .003 .143
Behavioural Engagement r 1 .593** .524**
p .000 .000
Cognitive Engagement r 1 .486**
p .000
Emotional Engagement r 1
p
Overall r
p
*Significant at the 0.05 level (two-tailed).
**Significant at the 0.01 level (two-tailed).
The findings indicate that behavioural and cognitive engagement, which reflect active 
participation and the perceived usefulness of chatbots, are positively associated with the level 
of English proficiency. Emotional engagement, while important, did not exhibit a direct 
correlation, suggesting that motivation and satisfaction alone may not directly enhance 
proficiency. These results underscore the importance of fostering active and cognitively 
engaging interactions with chatbots to support language learning outcomes.
4.3 Predictive Roles of Indonesian University Students’ Engagement with 
Chatbots in English Proficiency
Multiple linear regression was performed to predict the English proficiency levels of 
Indonesian university students based on their engagement with chatbots. As presented in 
Table 5, the results demonstrated that behavioural engagement, cognitive engagement, and 
Table 5. Results of regression analyses.
Model Unstandardized 
Coefficients
Standardized 
Coefficients
t Sig. 95% 
Confidence 
Interval for B
B Std. Error Beta Lower Bound
Behavioural 
Engagement .145 .054 .098 2.687 .008
Cognitive 
Engagement .120 .048 .102 2.502 .013
Emotional 
Engagement .036 .051 .029 .705 .481
Overall 
Engagement .140 .050 .123 2.812 .005
105ACADEMIC WRITING
overall engagement were significant predictors of students’ English proficiency. However, 
this analysis found emotional engagement had no meaningful predictive value for English 
proficiency. These findings suggest that behavioural engagement, cognitive engagement, and 
overall engagement with the AI chatbot enhance English proficiency. In contrast, emotional 
engagement alone does not show a significant impact.
The regression analysis indicates that students who engage more frequently and intensely 
(both behaviourally and cognitively) with the chatbot are likely to have higher levels of English 
proficiency. In contrast, although important for motivation, emotional engagement did not 
emerge as a predictor in this context. The results emphasize the value of fostering more active 
and meaningful interactions with AI tools to improve language learning outcomes.
5 Discussion
This study sought to explore Indonesian university students’ engagement with AI chatbots 
in language learning and how this engagement correlates with their English proficiency. 
The results yield critical insights into the pedagogical efficacy of AI-facilitated language 
acquisition technologies and their measurable impact on learner performance metrics. Three 
key points emerged from the data worth discussing: the relationship between engagement 
and proficiency, the differences in engagement across proficiency levels, and the predictive 
role of engagement in English proficiency. The study’s initial findings revealed moderate 
engagement levels among learners utilizing AI-assisted language learning tools, with this 
trend remaining consistent across all proficiency levels, which mirrors the findings of previous 
studies in similar contexts (Xu and Li 2024; Oktarin et al. 2024). These findings align with 
those from the existing literature suggesting that learners typically demonstrate moderate 
engagement in educational activities mediated by chatbots, a pattern observed consistently 
across varying levels of linguistic proficiency. However, the study found notable differences 
when examining engagement in specific activities, such as frequency and duration of chatbot 
use. Higher proficiency students (B1, B2) tended to engage more frequently and for more 
extended periods than their lower proficiency counterparts (A1, A2). This finding supports 
the idea that more proficient learners will likely find more value in these tools, possibly 
because of their ability to understand and apply the language more effectively in interactive 
scenarios. The findings suggest learners demonstrate intermediate engagement when utilizing 
chatbots for language acquisition, with this pattern persisting uniformly across all proficiency 
levels (Mageira et al. 2022).
Second, regarding engagement and its correlation with English proficiency, the data indicate 
a significant positive relationship between learner interaction with AI-driven conversational 
platforms and advances in English linguistic competence. This aligns with existing scholarship 
emphasizing the efficacy of interactive pedagogical technologies in facilitating language 
acquisition  (Yuan and Liu 2025). Notably, participants exhibiting elevated engagement 
metrics with these systems achieved superior performance outcomes on standardized 
proficiency measures. This suggests that engagement is a measure of the time spent interacting 
with learning tools and an indicator of active cognitive and emotional involvement in the 
learning process, which can enhance language skills. These results support the argument 
Table 5. Results of regression analyses.
Model Unstandardized 
Coefficients
Standardized 
Coefficients
t Sig. 95% 
Confidence 
Interval for B
B Std. Error Beta Lower Bound
Behavioural 
Engagement .145 .054 .098 2.687 .008
Cognitive 
Engagement .120 .048 .102 2.502 .013
Emotional 
Engagement .036 .051 .029 .705 .481
Overall 
Engagement .140 .050 .123 2.812 .005
106 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ...
made by Guo et al. (2023), who observed that engagement with educational technology 
tools, such as chatbots, can foster deeper learning and improve academic performance.
Moreover, the study’s analysis revealed significant differences in engagement between 
proficiency levels. This was particularly noticeable in students’ more frequent and sustained 
interactions with the chatbots at higher proficiency levels. Students at the A1 and A2 levels 
tended to have more limited engagement with the chatbot, focusing primarily on basic tasks 
and responses. In contrast, students at the B1 and B2 levels were more likely to engage in 
complex conversations and tasks, suggesting they could leverage the chatbot tools for more 
advanced learning opportunities. Consistent with the previous research (Huang, Hew, and 
Fryer 2022), learners with advanced linguistic proficiency exhibit heightened efficacy in 
deploying educational technologies to cultivate lexical growth, grammatical accuracy, and 
holistic language mastery. The data also showed that engagement at the lower proficiency levels 
often involved more basic tasks, such as vocabulary drills or simple sentence construction, 
reflecting the learners’ limited linguistic ability.
Lastly, when examining the predictive role of student engagement in English proficiency, 
the study found that engagement, particularly in frequency and duration of interaction 
with the chatbot, had a significant predictive role in determining English proficiency. 
The regression analysis results indicated that the more engaged students were with the AI 
chatbot, the higher their English proficiency tended to be. This is consistent with prior 
research on the predictive power of engagement in language learning (Alahmari and Alrabai 
2024). Specifically, the study found that engagement with chatbots was a stronger predictor 
of English proficiency than other factors, such as traditional classroom instruction. This 
suggests that chatbots, with their interactive and personalized nature, provide an effective 
means of language practice that can lead to better proficiency over time. The study also 
found that engagement in emotionally and cognitively challenging tasks with the chatbot 
was particularly beneficial for improving language skills, as it encouraged deeper language 
processing (Schuetzler, Grimes, and Giboney 2020).
6 Conclusions
This study investigated the engagement of non-English majors at three Indonesian universities 
with chatbots in language learning and its correlation with their English proficiency. The 
findings revealed that within this specific context, higher engagement levels, particularly 
in frequency and duration of interaction with the chatbot, were positively associated with 
higher English proficiency. Students with higher proficiency levels demonstrated more active 
and sustained engagement with the chatbots, reflecting their ability to leverage these tools 
for more advanced learning opportunities. However, these patterns may not extend to other 
student populations or educational settings.
The study also highlighted the predictive role of student engagement in their English 
proficiency among the sampled population. It demonstrated that the frequency and 
intensity of engagement with chatbots could significantly predict improvements in English 
proficiency within this cohort. These results suggest that chatbots could serve as a valuable 
tool for enhancing language proficiency among similar student populations, particularly in 
107ACADEMIC WRITING
contexts with comparable demographics and learning environments. Educators in analogous 
settings might consider integrating AI-assisted learning tools into their curricula to foster 
engagement and improve language outcomes, though further research is needed to validate 
these implications for broader application.
While this study offers valuable insights, there are some important limitations to consider. 
The small number of B2-level participants (n=6) makes it difficult to draw strong conclusions 
about this subgroup, particularly in statistical analyses like regression – though results for the 
A1, A2, and B1 groups remain reliable. Additionally, since the research was conducted at 
three universities in Indonesia, its findings may not fully apply to students in other regions or 
cultural contexts. Another limitation lies in the methodology: relying solely on surveys and 
proficiency tests might overlook the lived experiences of students interacting with chatbots. 
Incorporating qualitative approaches, such as interviews or classroom observations, could 
uncover richer details about their challenges and behaviours. Future studies should aim 
for larger, more diverse participant pools across all proficiency levels, blending quantitative 
and qualitative methods, and exploring how AI tools impact language learning over time. 
Understanding what drives student engagement with these technologies will also be essential 
for creating tailored, effective learning strategies.
Acknowledgement
The authors would like to express their sincere gratitude to the Center for Higher Education 
Funding and Assessment (PPAPT) and the Indonesia Endowment Fund for Education (LPDP) at 
the Ministry of Finance of the Republic of Indonesia for their support and funding of this research.
References
Alahmari, Arwa, and Fakieh Alrabai. 2024. “The predictive role of L2 learners’ resilience in language 
classroom engagement.” Frontiers in Education 9. https://doi.org/10.3389/feduc.2024.1502420.
Al-Obaydi, Liqaa Habeb, Farzaneh Shakki, Ragad M. Tawafak, Marcel Pikhart, and Raed Latif Ugla. 
2023. “What I know, what I want to know, what I learned: Activating EFL college students’ cognitive, 
behavioral, and emotional engagement through structured feedback in an online environment.” 
Frontiers in Psychology 13:1083673. https://doi.org/10.3389/fpsyg.2022.1083673.
Alsawaier, Raed S. 2018. “The effect of gamification on motivation and engagement.” The International Journal 
of Information and Learning Technology 35 (1): 56–79. https://doi.org/10.1108/IJILT-02-2017 -0009.
Baskara, FX Risang. 2023. “Integrating ChatGPT into EFL writing instruction: Benefits and challenges.” 
International Journal of Education and Learning 5 (1): 44–55. https://doi.org/10.31763/ijele.v5i1.858.
Casal, J. Elliott, and Matt Kessler. 2023. “Can linguists distinguish between ChatGPT/AI and human 
writing?: A study of research ethics and academic publishing.” Research Methods in Applied Linguistics 
2 (3): 100068. https://doi.org/10.1016/j.rmal.2023.100068.
Chang, Daniel H., Michael Pin Chuan Lin, Shiva Hajian, and Quincy Q. Wang. 2023. “Educational 
design principles of using AI chatbot that supports self-regulated learning in education: Goal setting, 
feedback, and personalization.” Sustainability 15 (17): 12921. https://doi.org/10.3390/su151712921.
Gayed, John Maurice, May Kristine Jonson Carlon, Angelu Mari Oriola, and Jeffrey S. Cross. 2022. 
“Exploring an AI-based writing assistant’s impact on English language learners.” Computers and 
Education: Artificial Intelligence 3:100055. https://doi.org/10.1016/J.CAEAI.2022.100055.
Guo, Kai, Yuchun Zhong, Danling Li, and Samuel Kai Wah Chu. 2023. “Investigating students’ 
engagement in chatbot-supported classroom debates.” Interactive Learning Environments 31 (6): 1–17. 
https://doi.org/10.1080/10494820.2023.2207181.
108 T. Hastomo, A. S. Sari, U. Widiati, F.M. Ivone, E. Laily Zen, M. F. N. Kholid Does Student Engagement with Chatbots ...
Hastomo, Tommy, Muhammad Fikri Nugraha Kholid, Pipit Muliyah, Linda Septiyana, and Widi Andewi. 
2024. “Exploring how video conferencing impacts students’ cognitive, emotional, and behavioral 
engagement.” Journal of Educational Management and Instruction 4 (2): 213–25. https://doi.org/10.22 
515/jemin.v4i2.9335.
Hastomo, Tommy, Berlinda Mandasari, and Utami Widiati. 2024. “Scrutinizing Indonesian pre-service 
teachers’ technological knowledge in utilizing AI-powered tools.” Journal of Education and Learning 
(EduLearn) 18 (4): 1572–81. https://doi.org/10.11591/edulearn.v18i4.21644.
Hastomo, Tommy, and Linda Septiyana. 2022. “The investigation of students’ engagement in online class 
during pandemic COVID-19.” Jurnal Penelitian Ilmu Pendidikan 15 (2). https://doi.org/10.21831/JP 
IPFIP.V15I2.49512.
Huang, Weijiao, Khe Foon Hew, and Luke K. Fryer. 2022. “Chatbots for language learning – Are they 
really useful? A systematic review of chatbot‐supported language learning.” Journal of Computer 
Assisted Learning 38 (1): 237–57. https://doi.org/10.1111/jcal.12610.
Karabiyik, Ceyhun. 2019. “The relationship between student engagement and tertiary level English 
language learners’ achievement.” International Online Journal of Education and Teaching 6 (2): 281–93. 
https://eric.ed.gov/?id=EJ1248494.
Kim, Hea Suk, Yoonjung Cha, and Na Young Kim. 2021. “Effects of AI chatbots on EFL students’ 
communication skills.” Korean Journal of English Language and Linguistics 21: 712–34.  
https://doi.org /10.15738/kjell.21.202108.712.
Kim, Susie. 2021. “Generalizability of CEFR criterial grammatical features in a Korean EFL corpus across 
A1, A2, B1, and B2 levels.” Language Assessment Quarterly 18 (3): 273–95. https://doi.org/10.1080 
/15434303.2020.1855647.
Logli, Chiara. 2016. “Higher education in Indonesia: Contemporary challenges in governance, access, and 
quality.” In The Palgrave Handbook of Asia Pacific Higher Education, edited by Christopher S. Collins, 
Molly N.N. Lee, John N. Hawkins and Deane E. Neubauer, 561–81. Palgrave Macmillan US. 
https://doi.org/10.1057/978-1-137-48739-1_37.
Mageira, Kleopatra, Dimitra Pittou, Andreas Papasalouros, Konstantinos Kotis, Paraskevi Zangogianni, 
and Athanasios Daradoumis. 2022. “Educational AI chatbots for content and language integrated 
learning.” Applied Sciences 12 (7): 3239. https://doi.org/10.3390/app12073239.
Moreira, Paulo A.S., Adelaide Dias, Carla Matias, Jorge Castro, Tânia Gaspar, and Joana Oliveira. 2018. 
“School effects on students’ engagement with school: Academic performance moderates the effect of 
school support for learning on students’ engagement.” Learning and Individual Differences 67 
(October): 67–77. https://doi.org/10.1016/J.LINDIF.2018.07.007.
Nurchurifiani, Eva, Aksendro Maximilian, Galuh Dwi Ajeng, Purna Wiratno, Tommy Hastomo, and 
Andri Wicaksono. 2025. “Leveraging AI-powered tools in academic writing and research: Insights 
from English faculty members in Indonesia.” International Journal of Information and Education 
Technology 15 (2): 312–22. https://doi.org/10.18178/ijiet.2025.15.2.2244.
Oktarin, Irene Brainnita, Maria Edistianda Eka Saputri, Betty Magdalena, Tommy Hastomo, and 
Aksendro Maximilian. 2024. “Leveraging ChatGPT to enhance students’ writing skills, engagement, 
and feedback literacy.” Edelweiss Applied Science and Technology 8 (4): 2306–19. https://doi.org/10.552 
14/25768484.v8i4.1600.
Rudolph, Jürgen, Samson Tan, and Shannon Tan. 2023. “ChatGPT: Bullshit spewer or the end of 
traditional assessments in higher education?” Journal of Applied Learning & Teaching 6 (1): 342–63. 
https://doi.org/10.37074/jalt.2023.6.1.9.
Sari, Lusi Purnama, Tommy Hastomo, and Eva Nurchurifiani. 2023. “Assessing the efficacy of Duolingo 
for acquiring English vocabulary skills: Experimental research.” Journal of English Teaching Applied 
Linguistics and Literatures 6 (2): 193–200.
Schuetzler, Ryan M., G. Mark Grimes, and Justin Scott Giboney. 2020. “The impact of chatbot 
conversational skill on engagement and perceived humanness.” Journal of Management Information 
Systems 37 (3): 875–900. https://doi.org/10.1080/07421222.2020.1790204.
Shen, Chen, Penghai Shi, Jirong Guo, Suyun Xu, and Jiwei Tian. 2023. “From process to product: Writing 
engagement and performance of EFL learners under computer-generated feedback instruction.” 
Frontiers in Psychology 14 (October):1–13. https://doi.org/10.3389/fpsyg.2023.1258286.
109ACADEMIC WRITING
Shikun, Shan, Gevorg Grigoryan, Ning Huichun, and Hasmik Harutyunyan. 2024. “AI chatbots: 
Developing English language proficiency in EFL classroom.” Arab World English Journal 1 (1): 292–
305. https://doi.org/10.24093/awej/ChatGPT.20.
Silitonga, Lusia Maryani, Santhy Hawanti, Feisal Aziez, Miftahul Furqon, Dodi Siraj Muamar Zain, Shelia 
Anjarani, and Ting Ting Wu. 2023. “The impact of AI chatbot-based learning on students’ motivation 
in English writing classroom.” In Innovative Technologies and Learning, 6th International Conference, 
ICITL 2023, Porto, Portugal, August 28–30, 2023, Proceedings, ICITL, edited by Yueh-Min Huang 
and Tánia Rocha, 542–49. Springer. https://doi.org/10.1007/978-3-031-40113-8_53.
Slamet, Joko. 2024. “Potential of ChatGPT as a digital language learning assistant: EFL teachers’ and 
students’ perceptions.” Discover Artificial Intelligence 4 (1): 46. https://doi.org/10.1007/s44163-024 
-00143-2.
Thorp, H. Holden. 2023. “ChatGPT is fun, but not an author.” Science 379 (6630): 313.  
https://doi.org /10.1126/science.adg7879.
Tian, Lili, and Yu Zhou. 2020. “Learner engagement with automated feedback, peer feedback and teacher 
feedback in an online EFL writing context.” System 91 (July):102247. https://doi.org/10.1016/j.syst 
em.2020.102247.
Ward, Wesley S., and Lisa M. Given. 2019. “Assessing intercultural communication: Testing technology 
tools for information sharing in multinational research teams.” Journal of the Association for 
Information Science and Technology 70 (4): 338–50. https://doi.org/10.1002/asi.24159.
Waziana, Winia, Widi Andewi, Tommy Hastomo, and Muhamad Hasbi. 2024. “Students’ perceptions 
about the impact of AI chatbots on their vocabulary and grammar in EFL writing.” Register Journal 17 
(2): 328–62. https://doi.org/https://doi.org/10.18326/register.v17i2.352-382.
Xu, Jinfen, and Juan Li. 2024. “Effects of AI affordances on student engagement in EFL classrooms: A 
structural equation modelling and latent profile analysis.” European Journal of Education 59 (4): 
e12808. https://doi.org/10.1111/ejed.12808.
Yu, Shulin, Lianjiang Jiang, and Nan Zhou. 2020. “Investigating what feedback practices contribute to 
students’ writing motivation and engagement in Chinese EFL context: A large scale study.” Assessing 
Writing 44:100451. https://doi.org/10.1016/j.asw.2020.100451.
Yuan, Lingjie, and Xiaojuan Liu. 2025. “The effect of artificial intelligence tools on EFL learners’ 
engagement, enjoyment, and motivation.” Computers in Human Behavior 162:108474.  
https://doi.org /10.1016/j.chb.2024.108474.
Zou, Bin, Xin Guan, Yinghua Shao, and Peng Chen. 2023. “Supporting speaking practice by social 
network-based interaction in artificial intelligence (AI)-assisted language learning.” Sustainability 15 
(4): 2872. https://doi.org/10.3390/su15042872.
Zulianti, Hajjah, Hastuti Hastuti, Eva Nurchurifiani, Tommy Hastomo, Aksendro Maximilian, and Galuh 
Dwi Ajeng. 2024. “Enhancing novice EFL teachers’ competency in AI-powered tools through a 
TPACK-based professional development program.” World Journal of English Language 15 (3): 117. 
https://doi.org/10.5430/wjel.v15n3p117.

Part IV
English Language 
and Literature 
Teaching 

113ENGLISH LANGUAGE AND LITERATURE TEACHING
2025, Vol. 22 (1), 113-131(228)
journals.uni-lj.si/elope
https://doi.org/10.4312/elope.22.1.113-131
UDC: [811.111+811.112.2’243:37.09.3]:004.89
Saša Jazbec, Bernarda Leva, Marta Licardo
University of Maribor, Slovenia
AI Is Here to Stay: An Empirical Study of Attitudes 
Among Teachers of English and German
ABSTRACT
Artificial intelligence (AI) is a disruptor increasingly impacting foreign language learning 
and teaching. This paper explores the theoretical framework of AI, its application in foreign 
language teaching, and the question of whether AI is displacing foreign language teachers. 
The empirical part presents findings from a survey of English and German teachers (n = 
112) in Slovenian primary and secondary schools regarding their views on AI in foreign 
language teaching. Statistical analysis reveals a constructively critical attitude towards AI 
among teachers, acknowledging its presence in and influence on teaching strategies, methods, 
and teacher roles but not perceiving it as a fundamental threat. Furthermore, statistical tests 
and correlations indicate no significant differences in attitude towards AI in the classroom 
based on whether they are English or German teachers or whether they work in primary or 
secondary schools.
Keywords: AI, disruption, teaching English and German as a foreign language, challenges, 
problems
UI je prišla in bo ostala: empirična raziskava  
o stališčih učiteljev in učiteljic angleščine in nemščine
IZVLEČEK
Umetna inteligenca (UI) je kot disrupcija močno posegla tudi v učenje in poučevanje tujega 
jezika. V prispevku najprej osvetlimo teoretski okvir pojmovanja UI, razpravljamo o UI pri 
pouku tujega jezika in se posvečamo tudi vprašanju, ali UI izpodrinja učitelje in učiteljice 
tujega jezika. V empiričnem delu predstavljamo izsledke raziskave, v kateri so svoja stališča 
o UI pri pouku tujega jezika izrazili učitelji in učiteljice angleščine in nemščine (n = 112) v 
osnovnih in srednjih šolah v Sloveniji. Statistična analiza podatkov anketiranih je pokazala, 
da so do UI konstruktivno kritični, da se zavedajo njene prisotnosti in da zelo vpliva na 
strategije, metode dela pri pouku in delo učiteljev in učiteljic, jih spreminja, a jih ne ogroža. S 
statističnimi testi in korelacijami pa smo ugotavljali tudi, da ni statistično pomembnih razlik 
med stališči anketiranih do UI pri pouku glede na to, ali učijo angleščino ali nemščino, niti 
ne, ali delajo v osnovni ali v srednji šoli.
Ključne besede: UI, disrupcija, pouk angleščine in nemščine kot tujega jezika, izzivi, 
problemi
114 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ...
1 Introduction
Elias inspires Carinthian students as a sports teacher in Austria or as an English teacher for 
students in Finland; Charlie supports students at primary schools in Switzerland in developing 
social skills and dealing with emotions; Pepper enjoys teaching pupils in a school in Serbia, 
etc. Elias, Charlie, and Pepper are obviously excellent, popular teachers but simultaneously 
humanoid robots. They function based on artificial intelligence (AI); they learn and 
teach what is being taught to them, and they use both factual knowledge and non-verbal 
communication. Watching students communicate with humanoid robots is fascinating but 
also frightening. Students are motivated; they listen; they are willing to imitate the robot in 
sports; they smile when their errors are corrected; they endeavour to be better and are happy 
when the robot praises them and is satisfied with their work or performance.
The enthusiastic engagement of students with humanoid robots offers a compelling glimpse 
into the potential of AI to shape learning experiences. However, the increasing sophistication 
of AI in education presents both opportunities and challenges. While AI-supported tools 
have become essential, the disruptive nature of this technology requires that educators adapt 
and prepare for significant change. After such disruption, returning to the status quo ante 
is no longer possible. In the context of AI, we must accept it as a new reality in education – 
which is the focus of this article – and develop strategies and procedures that enable teachers 
and AI to work together harmoniously and optimally in the educational process.
This paper aims to present the conceptual framework of artificial intelligence in foreign 
language teaching. It also shares selected findings from an empirical survey of foreign 
language teachers in Slovenia, i.e. teachers of English and German, regarding the use of AI in 
their classrooms. Finally, it explores potential differences in perspective between English and 
German teachers on this topic. Within the context of various dilemmas posed by the use of 
AI in education, this paper seeks, among other things, theoretical and empirical answers to 
the vital question of whether AI will ultimately replace the foreign language teacher.
Comparable questions also formed the starting point for empirical research by the Vodafone 
Foundation in Germany. Their target audience, however, was not classroom teachers but citizens, 
or parents of school-age children. They conducted an interesting, topical, and representative 
study on AI in schools with 5,000 citizens and 500 parents of school-age children, with the 
meaningful title Expedition into the Unknown (Vodafone 2023, 1-24).1 Below, we summarise 
the most important findings. Analysis of the results reveals that slightly more than half the 
respondents (the study states a majority) believe that AI will significantly change the future 
of the classroom (54%). Although at the time of the survey, they were still sceptical about the 
use of AI in school, seeing it as a threat rather than an opportunity (57%), they also wanted 
AI to become part of the curriculum (55%). The study explains this seemingly paradoxical 
finding by saying that those who understand that AI (e.g., ChatGPT) will remain part of our 
lives want children to be ready for this challenge. Respondents also believe that developing 
1 More than 5,000 German citizens aged 18+ and 500 parents with school-age children up to 18 years participated in the 
study. The empirical data was collected over three days, from 23 March 2023 to 25 March 2023, in an open online panel 
(Vodafone 2023).
AI Is Here to Stay: 
An Empirical Study 
of Attitudes Among 
Teachers ...
115ENGLISH LANGUAGE AND LITERATURE TEACHING
digital competences is primarily the responsibility of schools (77%) and only then of parents. 
Interestingly, the study also confirmed by two-thirds that the regulation of the use of AI in 
school should be determined at the school level and not, as is common in Germany for school 
regulation, at the level of the federal state (cf. Vodafone 2023). The answer to the question of 
whether artificial intelligence will replace “natural intelligence,” i.e. the teacher, in the future is 
not a dilemma for the respondents of the Vodafone study, as 90% of them do not think this 
will happen.
Having explored the perspectives of citizens and parents on AI in schools, it is crucial to 
establish a clear understanding of what exactly this term means. The subsequent section will 
explore the definition and historical context of artificial intelligence.
2 Artificial Intelligence 
2.1 Artificial Intelligence – A Conceptual Framework
The term artificial intelligence was first used in 1956 by a group of experts at Harvard as part 
of the Summer Research Project on Artificial Intelligence (1956). The experts set themselves the 
goal of describing the learning process and the characteristics of intelligence in such detail that 
they could develop a machine that could simulate this process (Ramge 2018, 33). The term 
artificial intelligence has since been frequently used in publications addressing the Turing test. 
Several experts, including Kačič (2024), have explored the appropriateness of the term artificial 
intelligence. According to Kačič, drawing on definitions from The Britannica Dictionary, 
intelligence is defined as the ability to learn, understand and make judgements or opinions 
based on reason and the ability to cope with novel or tricky situations. The adjective artificial 
denotes a physical substitute with equivalent functionality to a natural counterpart (artificial 
hip, artificial knee, artificial tooth, etc.) and is used in various contexts, including technology 
and medicine (cf. Kačič 2024). Since artificial intelligence does not have the equivalent 
functionality of natural intelligence and since it learns but does not have the ability to judge, 
understand what it has learnt or have an opinion, Kačič proposes the term virtual intelligence. 
Despite the conceptual appropriateness of the term virtual intelligence, Kačič (2024; cf. also De 
Florio-Hansen 2020, 46)22 acknowledges that the term artificial intelligence is so deeply rooted 
and so widely used that it would be difficult or downright impossible to change.
The well-known and often quoted technicist definition of AI was drafted by the OECD 
in 2023 and revised in 2024. “An AI system is a machine-based system that, for explicit 
or implicit objectives, infers, from the input it receives, how to generate outputs such as 
predictions, content, recommendations, or decisions that can influence physical or virtual 
environments. Different AI systems vary in their levels of autonomy and adaptiveness after 
deployment” (OECD 2024). The European Parliament, however, more politely and tellingly, 
but also less precisely, has stated that “AI is the ability of a machine to display human-like 
capabilities such as reasoning, learning, planning and creativity” (European Parliament).
2 In addition to the term artificial intelligence, terms for the opposing type of intelligence, such as human intelligence, natural 
intelligence and non-AI, appear in professional publications. Interestingly, however, there is not the same unanimity in 
professional and popular circles when it comes to naming these intelligences as there is for artificial intelligence.  
116 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ...
These definitions, and many others, are unanimous in ascribing to a machine, system, or 
programme similar capabilities to those of a human being, i.e. thinking, reasoning, learning, 
and communicating. This understanding of AI will be the starting point for this paper. 
2.2 Artificial Intelligence – A Development Framework
Since 1956, the development of artificial intelligence, or the tools supported by artificial 
intelligence, has experienced exponential growth. Despite this rapid development, experts 
distinguish between AI Winter and AI Summer. During an AI Winter, progress continues, 
but AI receives less attention from both experts and the public. Conversely, an AI Summer is 
characterised by intense development, and AI is at the centre of the action and of both expert 
and non-expert discussions. Current predictions and analyses suggest we are in an extended 
AI Summer, with some even suggesting a “perpetual summer” (cf. Rubanau 2024).
Experts categorise AI into weak and strong AI (e.g., Wong 2020; Miao et al. 2021). Weak AI 
refers to tools and systems that focus on and are highly successful at solving specific problems 
(e.g., language learning and translation tools such as Duolingo, Grammarly, and Duden 
Mentor) (Marr 2018, 21). Strong AI, also called superintelligence, aims to create systems of 
neural networks that mimic human brain function, including the interpretation of emotions, 
feelings, and context, and are capable of learning on their own. While numerous tools are 
powered by weak intelligence, those supported by strong intelligence are still evolving. It is 
the latter that have become cause for concern and fear; the pace of development is breakneck, 
while the development of control systems and systems to monitor their use is lagging far 
behind. This concern was highlighted by a widely publicised open letter in early 2023, signed 
by technology giants, calling for a six-month moratorium on the development of AI systems 
more powerful than ChatGPT. They argued that the development was too fast for legal 
certainty and that the risks to humans and humanity were too significant (Clarke 2023). 
Despite this call, the proposal for a moratorium has not been implemented, and development 
is proceeding at its own rapid pace, as evidenced by numerous publications and studies. 
Above all, AI is increasingly permeating and transforming the educational space, including 
foreign language learning.
3 Artificial Intelligence in Foreign Language Learning 
3.1 Literature Review
The number of publications on AI in and for education is growing exponentially. Experts 
from diverse fields, computer scientists, but also psychologists, philosophers, linguists, 
neuroscientists, economists, politicians, translators, etc., are writing about AI with the 
common goal of getting to know, understand and explore the potentials and limitations of 
AI as much as possible. 
At the time of writing, for example, 64,166 publications on artificial intelligence and 
education have been published in the University of Maribor’s electronic resources system 
in the last five years in English (search string: artificial intelligence and education), 195 in 
German (search string: künstliche Intelligenz und Bildung), and 21 in Slovene (search string: 
117ENGLISH LANGUAGE AND LITERATURE TEACHING
umetna inteligenca) (e.g., UM:NIK 2025). There has been a surge of publications reviewing 
and analysing AI research. In particular, the number of discussions increased when OpenAI 
released ChatGPT, a revolutionary application, freely available on 30 November 2022 (Hong 
2023, 38). ChatGPT is a chatbot that can conduct a dialogue with an interlocutor in a 
convincing way and offers a wide range of possibilities that go beyond traditional pedagogical 
procedures (Baskara and Mukarto 2023). Although it does not understand the questions but 
generates answers according to the principles of frequency and relevance (Thorp 2023, 313), 
it has made particularly strong inroads into the (foreign) language learning and teaching 
process. Further possibilities and pitfalls of using ChatGPT in learning and teaching will 
not be discussed in this paper because of its limited scope (e.g., Hong 2023; Kasneci et al. 
2023; Kartal 2023; Dolenc and Brumen 2024; Tica and Krsmanović 2024); ChatGPT will 
be considered as one of the AI tools in foreign language teaching.
Among the contributions in Slovene that are of interest to the Slovene pedagogical area, we 
would like to highlight the following:
1) a monograph on contemporary perspectives on society and artificial intelligence 
(Bregant, Aberšek and Borstner 2022). It brings together high-profile scientific 
contributions, in which AI is interdisciplinarily and critically discussed from the 
perspectives of computer scientists, psychologists and educators; 2) a scientific 
monograph on the use of generative AI in education (Žerovnik and Zapušek 2024). 
It lays the theoretical groundwork for the innovative and practical use of AI in 
education, discusses the ethical aspects of the use of AI, and identifies guidelines 
for the integration of generative AI in education (Žerovnik and Zapušek 2024). In 
addition to the above, there is a vast number of master’s and bachelor’s theses, as well 
as lectures, seminars, forums, and portals where users can learn about the practical 
possibilities of using tools, most often ChatGPT, in school. It is up to each individual 
to consider the quality, professionalism, criticality, accountability, and marketing 
interests of these tools. 
However, the number of resources in English that are also interesting to the Slovenian 
pedagogical area is almost innumerable. We would like to highlight two key sources:
First, an interdisciplinary scientific monograph by Licardo and Lipovec (2024) that 
explores the intersection of AI literacy and social-emotional skills within the educational 
context. The contributions in this monograph are empirical studies conducted in 
Slovenia that address the technical aspects of AI, and its ethical dimensions, while 
also providing a deeper insight into social-emotional learning. The main purpose of 
the studies is to show, in a theoretically grounded and empirically supported way, 
how AI and social-emotional skills, as transversal competences, can be developed 
and integrated into educational frameworks. The second key source is a scientific 
paper by Dolenc and Brumen (2024) that focuses on foreign language teaching and 
investigates social and computer science students’ perceptions of the integration 
and use of AI-based technologies in education. The empirical results highlighted an 
interesting aspect that is not often discussed in the context of AI and education, i.e. 
the importance of the gender and discipline of the teacher to the introduction of AI 
118 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ...
in education. Students in social sciences and women are generally less inclined to use 
AI tools in foreign language education, often expressing doubts about their ability to 
enhance academic performance. These groups tend to be more critical of or cautious 
about the role AI plays in language learning. While they acknowledge that AI can be 
a useful tool to enrich the learning process, they also emphasise the irreplaceable value 
of human teachers in education. This empirical research is particularly relevant for the 
development of guidelines for teacher education, which usually do not consider the 
importance of gender and the professional profile of the teacher.
There are also many papers on the question of whether AI will replace the teacher (e.g., Chan 
and Tsi 2024; Bouras 2024; Pettersson et al. 2024; Knaus 2024). As a point of interest, 
we summarise an excursus by Knaus in which he reflects from an educator’s perspective on 
whether teachers are still needed in the world of AI. Knaus (2024) believes that the answer 
to the question is a dystopian vision that runs like a thread through the history of media. 
As soon as a technical innovation has potential similar to a teacher’s, there is talk that it 
may displace them. Thus, at the beginning of the book, Knaus reports that innovation was 
once credited with breaking down the teacher’s “information monopoly.” School television, 
programmed learning, language labs, Virtual Learning Environments (VLEs), Personal 
Learning Environments (PLEs) or Massive Open Online Courses (MOOCs) could also 
be labelled as attempts at an educational revolution, each aiming to distribute information 
more widely and potentially displace the teacher. Knaus believes that, despite AI systems, 
which are undoubtedly excellent, this will not happen because the learning process is not 
only about interaction and communication of knowledge (which AI can do) but also about 
relationships, the development of individuals, enculturation, social integration, and social 
competences, which can only be developed if one is in society/in contact with human beings/
people (cf. Knaus 2024, 20–21).
3.2 Challenges and Problems of Using Artificial Intelligence in Foreign 
Language Teaching
Traditionally, learning and teaching foreign languages has been done with the help of ICT. 
Most foreign language teachers are familiar with ICT and use it regularly in their work. In 
the professional literature, this type of learning and teaching is called Computer Assisted 
Language Learning (CALL) or Mobile Assisted Language Learning (MALL). However, with 
developments in natural language processing, advances in deep and networked learning, 
and the increasing technological ability to handle big data, Intelligent Computer-Assisted 
Language Learning (ICALL) has evolved. On the one hand, Intelligent Computer-Assisted 
Language Learning systems have brought about a fundamental qualitative change in 
student-computer interaction (Kannan and Munday 2018); on the other hand, they have 
severely disrupted existing pedagogical formats of foreign language learning and teaching. 
Alongside this relativisation of existing pedagogical formats, ICALL has also sparked a series 
of controversial debates and reflections on the necessity and reasonableness of using AI for 
learning and teaching, as well as on the dangers and disruptive changes that its imminent use 
seems to imply (e.g., Strasser 2020; Dargan 2019; Renz et al. 2020). 
119ENGLISH LANGUAGE AND LITERATURE TEACHING
The biggest problems, fears, and legitimate dangers of AI in foreign language learning and 
teaching faced by teachers, decision-makers, students, and parents revolve around several 
key questions. These include the role of both the foreign language teacher and the learner in 
the new concepts of AI-assisted learning; issues of authorship, ethics, and copyright; issues 
of personal data protection and regulation of AI use; issues of the goals and competences to 
be developed in foreign language teaching, knowledge, testing, etc. Tica and Krsmanović 
(2024) address these concerns by emphasizing student apprehensions about ChatGPT’s 
limitations. Students often worry that such tools may not effectively cultivate deep linguistic 
competence or critical thinking. Moreover, fears of plagiarism, diminished originality and 
shallow engagement with learning materials make some students reluctant to rely on AI. 
These concerns suggest that AI should complement rather than replace traditional teaching 
methods, serving as a supportive resource rather than a primary instructional tool. Despite the 
intense debates in this area, systems are far too complex for us to expect answers soon or even 
in step with technological developments. This is particularly true in the field of education, 
where change is extremely slow, and the gap between technological development and realised 
change at the implementation level is the greatest. Also, the media habitus of teachers (and 
decision-makers) lags far behind media developments (cf. Hartmann 2021; Burow 2022). 
Beyond the challenges and problems, it should be emphasised, and the expert community 
agrees, that AI will not (or will not for some time) replace the teacher and traditional learning 
and teaching formats, but it will change and complement them (cf. Renz, Krishnaraja and 
Gronau 2020; Hartmann 2023).33
4 Artificial Intelligence in Foreign Language Teaching in 
Slovenia – Findings from Empirical Research 
In the empirical part, we present the views of foreign language (English and German) teachers 
in Slovenia on the use of artificial intelligence in foreign language learning and teaching. We 
start from the thesis that foreign language teachers in Slovenia are mostly hesitant towards the 
use of AI, that they do not consider AI to offer serious competition for them in the future, and 
that there is no difference in the views of English and German teachers (cf. Jazbec 2024). The 
research questions guided our analysis and were answered using a survey questionnaire. The 
data and the analysis of the results contribute to the quantitative analysis and interpretation 
of the research questions. While this study provides a quantitative overview, in-depth analyses 
of teachers’ attitudes, experiences, and practices, including the nuances of their perspectives, 
would require qualitative research and interpretations of the data collected and the theoretical 
starting points.
At the outset, it is essential to acknowledge that the analysed data presented must be read 
and understood within the context of our predefined limitations. Several limitations should 
be considered when interpreting the findings of this study. The sample was non-random, 
consisting of teachers who chose to participate. This could introduce selection bias, as those 
who are more interested in or favourable towards AI may have been more likely to respond. 
3 Bill Gates made a similar point: “AI will never replace teachers, but it is going to revolutionise teaching & learning” 
Gates (ASU&GSV conference, San Diego 2023).
120 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ...
Owing to the non-random sampling method, the results of the study cannot be generalised 
to all foreign language teachers in Slovenia or to other contexts. Since the survey was self-
administered and anonymous, there is a potential for response bias. Teachers may have 
provided responses they perceived as more socially acceptable or favourable regarding their 
professional use of AI, such as overestimating their current use or expressing more positive 
attitudes than they genuinely hold. 
4.1 Method
The purpose of the study is to gain insight into the views and beliefs of English and/or 
German language teachers on the use of artificial intelligence in foreign language teaching. 
The research questions were as follows:
•	 What are the beliefs of foreign language teachers on the use of AI in the future? 
Do they perceive AI as an opportunity or a threat? Are there significant differences 
in opinions between English and German language teachers on whether AI is an 
advantage or a threat in the classroom? 
•	 What are the views of foreign language teachers on the role of the teacher in AI use 
and the impact of AI use on learning? Are there differences between German and 
English language teachers on these issues?
•	 What are the correlations between beliefs about the potential of AI to improve 
teaching in schools, beliefs about the possibility that AI will not completely replace 
foreign language teachers in the future, and perceptions of the effects of AI on positive 
changes in student learning habits? 
•	 Do teachers know if their students use AI for learning, and is there a difference between 
teachers in primary school and teachers in upper secondary school on this question?
4.2 Participants and Data Collection 
The survey involved 112 foreign language teachers, including 46 German teachers, 41 
English teachers, 19 teachers of both English and German, and 6 teachers of other languages 
or subject areas. Of the teachers in the sample, 44% teach at primary schools, 51% at high 
schools, and 5% elsewhere. Most teachers have 21 to 30 years of work experience (36%), 
followed by teachers with up to 10 years of experience (29%), then 11 to 20 years (26%), 
and the smallest proportion have 31 to 40 years of experience (9%). It can be concluded that 
the study involved experienced teachers, as two-thirds of the surveyed teachers have ten or 
more years of work experience. It is a non-random sample, and generalisation of the results 
is not possible.
The profile of the respondents closely mirrors the overall population of foreign language teachers 
in Slovenia (Eurydice 2021/2022): half or a comparable percentage are employed in primary 
and high schools, and the languages German and English are equally represented in terms of 
the teacher profile. Also, most surveyed teachers have at least ten years of experience working in 
schools. Data were collected through a survey that was published on the online survey platform 
1ka portal. Respondents could fill out the survey from May 2023 to August 2023. In the 
121ENGLISH LANGUAGE AND LITERATURE TEACHING
survey, they consented to the collection of data and the publication of results. The survey is 
anonymous, and the data are processed at the group level.
4.3 The Instrument
This study employed a survey instrument designed to collect anonymous data, ensuring 
confidentiality and privacy. The instrument is designed to assess the attitudes and experiences 
of foreign language teachers concerning AI in education, comparing these to the broader 
teacher population in Slovenia. This allows for a detailed exploration of how AI is viewed 
within the educational context by those directly impacted by its integration. It includes 13 
questions and utilises 51 variables to gather comprehensive insights. The questions cover a 
range of topics, including the current and potential future role of artificial intelligence in 
teaching, teachers’ perceptions of AI as an opportunity or threat, and the practical uses of AI 
in educational settings.
The response options across the questions include Likert-type scales (e.g., strongly agree to 
strongly disagree), dichotomous choices (e.g., yes or no), and multiple-choice questions where 
respondents can select more than one answer. Specific questions explore the integration of AI 
in school, lesson planning, and the evaluation of student performance.
4.4 Analysis Results and Interpretation
The analysis was conducted using descriptive and inferential statistics in SPSS. To analyse 
differences between the teachers of English and German language, as well as between primary 
and upper secondary school teachers, we used the t-test. For analysing correlations between 
individual variables, we used Pearson’s correlation coefficient.
4.4.1 Using AI in Teaching: Opportunity or Threat?44 4.4.1 Using AI in Teaching: Opportunity or Threat?4 
FIGURE 1. Percentage (f %) of teachers’ responses on whether AI will significantly change teaching in the future. 
FIGURE 2. Percentage (f %) of teachers’ responses on how they perceive the possibilities of using AI in teaching. 
Figure 1 reveals that more than 80% of teachers believe that artificial intelligence will significantly change teaching 
in the future, while only 14% think this will not happen. Notably, German language teachers hold even stronger 
positive convictions (“absolutely yes”) than their English language counterparts. Figure 2 demonstrates that the 
foreign language teachers in Slovenia are not hesitant about using AI in schools. Specifically, 59.1% of German 
teachers and 65.9% of English teachers view AI in schools as an opportunity or a significant opportunity, while 
38.6% of German teachers and 26.9% of English teachers perceive the use of AI in schools more as a threat or 
an absolute threat. is group comparison suggests that German language teachers are more inclined to see AI as 
a potential threat compared to English language teachers. 
A comparative analysis of the data with the results of the Vodafone study (Vodafone 2023), where more than half 
the respondents (57%) saw AI more as a threat than an opportunity, reveals significantly different attitudes among 
foreign language teachers in Slovenia compared to the attitudes of parents in Germany. We can only hypothesise 
that the observed difference stems from foreign language teachers’ greater familiarity and experience with ICT 
tools compared to the parents surveyed in the Vodafone study (2023). Teachers have already recognised and tested 
4 This question and the data in Figures 1 and 2, previously published and discussed for the whole sample in Jazbec (2024), are presented here at the 
teacher group level. This serves as a foundation for our focus on differences between German and English teachers, the role of the teacher, and a 
comparison with the Vodafone study, all in the context of the original question. 
30.4
41.3
21.7
0.0
6.5
24.4
51.2
17.1
2.4
4.9
0.0 10.0 20.0 30.0 40.0 50.0 60.0
Absolutely yes
Yes
Rather not
Absolutely not
I don't know
English teachers German teachers
9.1
50.0
31.8
6.8
2.3
4.9
61.0
22.0
4.9
7.3
0.0 50.0 60.0 70.0
Absolutely as opportunity
More as opportunity
More as threat
Abolutely as threat
I don't know
10.0 20.0 
English teachers
30.0 40.0 
German teachers
Figure 1. Percentage (f%) of teachers’ responses on whether AI will significantly change teaching 
in the future.
4 This question and the data in Figures 1 and 2, previously published and discussed for the whole sample in Jazbec 
(2024), are presented here at the teacher group level. This serves as a foundation for our focus on differences between 
German and English teachers, the role of the teacher, and a comparison with the Vodafone study, all in the context of 
the original question.
122 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ...
4.4.1 Using AI in Teaching: Opportunity or Threat?4 
FIGURE 1. Percentage (f %) of teachers’ responses on whether AI will significantly change teaching in the future. 
FIGURE 2. Percentage (f %) of teachers’ responses on how they perceive the possibilities of using AI in teaching. 
Figure 1 reveals that more than 80% of teachers believe that artificial intelligence will significantly change teaching 
in the future, while only 14% think this will not happen. Notably, German language teachers hold even stronger 
positive convictions (“absolutely yes”) than their English language counterparts. Figure 2 demonstrates that the 
foreign language teachers in Slovenia are not hesitant about using AI in schools. Specifically, 59.1% of German 
teachers and 65.9% of English teachers view AI in schools as an opportunity or a significant opportunity, while 
38.6% of German teachers and 26.9% of English teachers perceive the use of AI in schools more as a threat or 
an absolute threat. is group comparison suggests that German language teachers are more inclined to see AI as 
a potential threat compared to English language teachers. 
A comparative analysis of the data with the results of the Vodafone study (Vodafone 2023), where more than half 
the respondents (57%) saw AI more as a threat than an opportunity, reveals significantly different attitudes among 
foreign language teachers in Slovenia compared to the attitudes of parents in Germany. We can only hypothesise 
that the observed difference stems from foreign language teachers’ greater familiarity and experience with ICT 
tools compared to the parents surveyed in the Vodafone study (2023). Teachers have already recognised and tested 
4 This question and the data in Figures 1 and 2, previously published and discussed for the whole sample in Jazbec (2024), are presented here at the 
teacher group level. This serves as a foundation for our focus on differences between German and English teachers, the role of the teacher, and a 
comparison with the Vodafone study, all in the context of the original question. 
30.4
41.3
21.7
0.0
6.5
24.4
51.2
17.1
2.4
4.9
0.0 10.0 20.0 30.0 40.0 50.0 60.0
Absolutely yes
Yes
Rather not
Absolutely not
I don't know
English teachers German teachers
9.1
50.0
31.8
6.8
2.3
4.9
61.0
22.0
4.9
7.3
0.0 50.0 60.0 70.0
Absolutely as opportunity
More as opportunity
More as threat
Abolutely as threat
I don't know
10.0 20.0 
English teachers
30.0 40.0 
German teachers
Figure 2. P rcentage (f%) of teachers’ responses on how th y perceiv  the possibili ies of using 
AI in teaching.
Figure 1 reveals that more than 80% of teachers believe that artificial intelligence will 
significantly change teaching in the future, while only 14% think this will not happen. 
Notably, German language teachers hold even stronger positive convictions (“absolutely yes”) 
than their English language counterparts. Figure 2 demonstrates that the foreign language 
teachers in Slovenia are not hesitant about using AI in schools. Specifically, 59.1% of German 
teachers and 65.9% of E lish teach rs vi w AI in schools as an opportunity or a significant 
opportunity, while 38.6% of German teachers and 26.9% of English teachers perceive the 
use of AI in schools more as a t reat or an absolute threat. This group comparison sug ests 
that German l nguage teachers are more inclined to se  AI as a potential threat compar d to 
English language teachers.
A comparative analysis of the data with the results of the Vodafone study (Vodafone 2023), 
where more than half the respondents (57%) saw AI more as a threat than an opportunity, 
reveals significantly different attitudes among foreign language teache s in Sl veni  compared 
to the attitudes of parents in Germany. We can only hypothesise that the observed difference 
stems from foreign language teachers’ greater familiarity and experience with ICT tools 
compared to the parents surveyed in the Vodafone study (2023). Teachers have already 
recognised and tested the benefits of using AI and have certainly also encountered the 
pitfalls of AI use (e.g., written assignments as homework in foreign language teaching, etc. 
Additionally, it is essential to consider in the analysis of survey results that the respondents 
were teachers who are familiar with AI, think about it, and engage with it.
Results in Table 1 indicate that teachers of English, based on the average response to the 
statement regarding whether they see the use of AI in schools as an opportunity, express 
slightly more favour towards the idea that AI is an opportunity (M = 2.50; SD = 0.98), 
compared to teachers of German (M = 2.34; SD = 0.85). However, there are no statistically 
significant differences between teachers of German and teachers of English (t(80) = -0.33; p = 
0.35). This lack of significant difference is unexpected, given that most current AI tools and 
training data are primarily in English. One might hypothesize that this would lead English 
teachers to perceive AI as more readily applicable and a more significant opportunity. This 
finding aligns with comparative studies of AI in foreign language teaching, which often do 
123ENGLISH LANGUAGE AND LITERATURE TEACHING
not distinguish between target languages, or focus primarily on English (e.g. Yuan 2024; Du 
and Daniel 2024). 
4.4.2 Using AI in Teaching: Perspectives on the Role of the Teacher and 
Its Impact on Students’ Learning Habits55
FIGURE 3. Percentage (f %) of teachers' responses on whether AI could, under certain conditions, provide better 
instruction in schools than teachers (with natural intelligence). 
FIGURE 4. Percentage (f %) of teachers' responses on whether AI will not completely replace teachers in the 
future. 
The data in Figures 3 and 4 provide insight into the perspectives of English and German teachers regarding the 
role and potential of artificial intelligence (AI) in educational contexts. Regarding the potential capability of AI 
to deliver superior teaching under certain conditions, it is evident that teachers across both language groups 
display considerable scepticism. Overall, 68.2% of foreign language teachers surveyed disagree or strongly disagree 
with the assertion that AI could surpass human teachers in instructional effectiveness. Conversely, a quarter 
(25.4%) of respondents across both groups recognise that, under specific circumstances, AI could indeed 
outperform human teachers. While this highlights a cautious acknowledgement of AI’s instructional potential, 
the majority viewpoint clearly favours human teaching competences. When considering the possibility of 
complete replacement of teachers by AI in the future (Figure 4), there is an even stronger consensus across both 
teacher groups. An overwhelming 85.5% of teachers reject the notion of full AI replacement of human teachers, 
underscoring widespread confidence in the irreplaceability of human educators. 
Comparatively, these results align closely across both English and German teacher groups, illustrating a shared 
perception among language educators. Both groups express strong reservations about AI fully replacing human 
instructors, yet both cautiously acknowledge AI’s supplementary role, contingent upon specific educational 
conditions. This comparative analysis underlines that perspectives concerning AI’s instructional role appear 
2.2
28.3
43.5
21.7
4.3
2.5
25.0
32.5
35.0
5.0
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0
Strongly agree
Agree
Disagree
Strongly disagree
Don't know
English teachers German teachers
39.1
43.5
8.7
4.3
4.3
58.5
34.1
4.9
2.4
0.0
0.0 10.0 30.0 50.0 60.0 70.0
Strongly agree
Agree
Disagree
Strongly disagree
Don't know
20.0
English teachers
40.0
German teachers
igure 3. P rcentage (f%) of teachers’ responses on whether AI ould, under certain conditions, 
provide better in truction in schools than achers (with natural intelligence).
The data in Figures 3 and 4 provide insight into the perspectives of English and German 
teachers regarding the role and potential of artificial intelligence (AI) in educational contexts. 
Regarding th  potential capability of AI to deliver superior teaching under certain conditions, 
it is evident that teachers across both language groups display considerable scepticism. Overall, 
68.2% of foreign language teachers surveyed disagree or strongly disagree with the assertion 
that AI could surpass human teachers in instructional effectiveness. Conversely, a quarter 
(25.4%) of respondents across both groups recognise that, under specific circumstances, AI 
5 The data in Figures 3 and 4 were published in Jazbec (2024) at the level of the whole sample. They are presented here at 
the level of the groups because they are a starting point for analysing differences in attitudes towards AI - a factor with the 
potential to reshape classrooms and eve  the role of German o  English teachers, which is the central focus of this paper.
Table 1. The t-test for differences between teachers of English and German language in attitudes 
about whether the use of AI in foreign language teaching represents an opportunity.
Variable Numerus Mean St. Deviation Levene test t-test
N M SD F P
t 
(df)
P
I perceive 
AI and its 
applications in 
schools more as 
an opportunity 
than a threat.
Teachers 
GEM 44 2.43 0.85
0.35 0.55 -0.33(80) 0.36Teachers
ENG 38 2.50 0.98
124 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ...
could indeed outperform human teachers. While this highlights a cautious acknowledgement 
of AI’s instructional potential, the majority viewpoint clearly favours human teaching 
competences. When considering the possibility of complete replacement of teachers by AI 
in the future (Figure 4), there is an even stronger consensus across both teacher groups. An 
overwhelming 85.5% of teachers reject the notion of full AI replacement of human teachers, 
underscoring widespread confidence in the irreplaceability of human educators.
Comparatively, these results align closely across both English and German teacher groups, 
illustrating a shared perception among language educators. Both groups express strong 
reservations about AI fully replacing human instructors, yet both cautiously acknowledge 
AI’s supplementary role, contingent upon specific educational conditions. This comparative 
analysis underlines that perspectives concerning AI’s instructional role appear remarkably 
consistent. Such unanimity may facilitate future international collaborative efforts aimed at 
responsibly integrating AI technologies into language education.
The results in Table 2 indicate no significant differences in opinion on whether AI could, 
under certain conditions, provide better instruction than teachers. For teachers of German 
(GEM), the mean response was 2.98 (SD = 0.88). For teachers of English (ENG), the mean 
was 3.16 (SD = 0.96), t(80) = -0.91, p = .18, suggesting a consensus that AI might not 
entirely outperform teacher’s instruction under existing conditions.
Concerning the opinion that AI will not completely replace teachers in the future, significant 
differences occurred between the two groups. Teachers of German reported a mean of 1.91 
(SD = 1.03), indicating more scepticism about AI replacing teachers, whereas teachers of 
English reported a more optimistic viewpoint with a mean of 1.47 (SD = 0.69), t(82) = 2.25, 
p = .01. This suggests that teachers of German language are more likely to believe that AI will 
not fully replace human teachers. However, the value of standard deviation in the group of 
German teachers is more than one, so we should interpret these results with caution.
Finally, attitudes towards AI’s potential impact on student learning habits also showed no 
significant difference; however, the responses leaned towards a more positive view of English 
teachers. Teachers of German averaged 2.83 (SD = 1.06), while teachers of English averaged 
FIGURE 3. Percentage (f %) of teachers' responses on whether AI could, under certain conditions, provide better 
instruction in schools than teachers (with natural intelligence). 
FIGURE 4. Percentage (f %) of teachers' responses on whether AI will not completely replace teachers in the 
future. 
The data in Figures 3 and 4 provide insight into the perspectives of English and German teachers regarding the 
role and potential of artificial intelligence (AI) in educational contexts. Regarding the potential capability of AI 
to deliver superior teaching under certain conditions, it is evident that teachers across both language groups 
display considerable scepticism. Overall, 68.2% of foreign language teachers surveyed disagree or strongly disagree 
with the assertion that AI could surpass human teachers in instructional effectiveness. Conversely, a quarter 
(25.4%) of respondents acr ss both groups recognise that, under specific circumstances, AI could ind ed 
outperform human teachers. While this highlights a cautious acknowledgement of AI’s instructi nal potential, 
the majority viewpoint clearly favours human teaching competences. When considering the possibility of 
complete replacement of teachers by AI in the future (Figure 4), there is an even stronger conse sus across both 
teacher groups. An ove whelming 85.5% of teachers reject the notion of full AI replacement of human teacher , 
und rscoring widespread confidence in the irreplaceability of human educators. 
Comparatively, these results align closely across both English and German te cher groups, illustrating a shared 
perception among a guage educators. Bo h groups exp ess strong reserva ions about AI fully replacing human 
instructors, yet both cautiously acknowledge AI’s supplementary role, contingent upon specific educational 
conditions. This comparative analysis underlines that perspectives concerning AI’s instructional role appear 
2.2
28.3
43.5
21.7
4.3
2.5
25.0
32.5
35.0
5.0
0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0 45.0 50.0
Strongly agree
Agree
Disagree
Strongly disagree
Don't know
English teachers German teachers
39.1
43.5
8.7
4.3
4.3
58.5
34.1
4.9
2.4
0.0
0.0 10.0 30.0 50.0 60.0 70.0
Strongly agree
Agree
Disagree
Strongly disagree
Don't know
20.0
English teachers
40.0
German teachers
Figure 4. Percentage (f%) of teachers’ responses on whether AI will not completely replace 
teachers in the future.
125ENGLISH LANGUAGE AND LITERATURE TEACHING
slightly more optimistic at 3.08 (SD = 1.19), t(82) = -1.03, p = .15. Overall, these findings 
indicate varied levels of acceptance and scepticism among teachers regarding the role of AI 
in education. We were also interested in exploring the correlations between beliefs about the 
potential of artificial intelligence to improve instruction in schools, the belief that artificial 
intelligence will not completely replace teachers in the future, and the perception of the 
effects of artificial intelligence on positive changes in student learning habits. 
Table 3. Means, standard deviations, reliabilities, and correlations of variables related to Perspectives 
on the Role of the Teacher and Its Impact on Student Learning Habits for English teachers.
N M SD 1 2
1. AI could, under certain conditions, 
provide better instruction in schools than 
teachers (with natural intelligence).
40 3.15 0.95
2. AI will not completely replace teachers in 
the future. 41 1.51 0.71 .06
3. AI could potentially have a more positive 
than negative impact on student learning 
habits in the future.
41 3.10 1.17 .44** -.12
Note. The variables are measured on a scale from 1 to 4. Higher scores reflect a greater extent of the measured variable. 
*p<.05, **p<.01
Table 2 The t-test for differences in attitudes between teachers of English and German language 
on perspectives on the role of the teacher and its impact on student learning habits.
Variables
Numerus Mean
St. 
Deviation Levene test t-test
N M SD F P
t
(df)
P
AI could, under 
certain conditions, 
provide better 
instruction in schools 
than teachers (with 
natural intelligence).
GEM 46 2.98 0.88
1.39 0.24 -0.91(80) 0.18
ENG 37 3.16 0.96
AI will not 
completely replace 
teachers in the 
future.
GEM 46 1.91 1.03
1.17 0.28 2.25 0.01
ENG 38 1.47 0.69
AI could potentially 
have a more positive 
than negative impact 
on student learning 
habits in the future.
GEM 46 2.83 1.06
0.45 0.51 -1.03 0.15
ENG 38 3.08 1.19
126 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ...
Table 3 presents descriptive statistics and Pearson correlation coefficients among the three 
key variables for English teachers, reflecting their perspectives on AI’s role in education. 
A statistically significant positive correlation was found between the belief that AI could 
provide better instruction than teachers and the belief that AI could have a more positive 
than negative impact on student learning habits (r = .44, p = 0.004). This finding indicates 
that English teachers who perceive AI as potentially superior in instructional contexts are also 
likely to view its influence on student learning habits optimistically.
Conversely, there was no significant correlation between the belief that AI will not completely 
replace teachers and the other two variables, suggesting that English teachers’ concerns about 
AI replacing teachers are independent of their views on the quality of AI instruction and its 
impact on student learning.
Table 4. Means, standard deviations, reliabilities, and correlations of variables related to 
Perspectives on the Role of the Teacher and Its Impact on Student Learning Habits for 
German teachers.
N M SD 1 2
1. AI could, under certain conditions, 
provide better instruction in schools than 
teachers (with natural intelligence).
46 2.98 0.88
2. AI will not completely replace teachers 
in the future. 46 1.91 1.02 .09
3. AI could potentially have a more 
positive than negative impact on student 
learning habits in the future.
46 2.83 1.06 .47** -.01
Note. The variables are measured on a scale from 1 to 4. Higher scores reflect a greater extent of the measured variable. 
*p<.05, **p<.01
Table 4 displays descriptive statistics and Pearson correlation coefficients among the three main 
variables for German teachers, exploring their views regarding AI’s potential in education. 
Similarly, as in the group of English teachers, results indicate significant, quite strong positive 
correlation between the belief in AI’s potential for providing better instruction and the belief 
that AI could positively affect student learning habits (r = .47, p = .000). This suggests that 
German teachers who have greater confidence in AI’s instructional capabilities also tend 
to be optimistic about AI’s beneficial effects on learning habits. However, no significant 
correlation emerged between the belief that AI will not completely replace teachers and the 
other measured variables (AI’s instructional quality and AI’s impact on learning habits). This 
implies that German teachers’ attitudes toward the likelihood of AI replacing human teachers 
are not associated with their perceptions of AI’s instructional effectiveness or its influence on 
student learning habits.
4.4.3 The Use of AI Among Students
The questionnaire focused on teachers and their opinions on the use of AI, but in one 
question, teachers also reflected on what they knew about the use of AI by their students. 
127ENGLISH LANGUAGE AND LITERATURE TEACHING
The data in Figure 5 shows that the percentage of teachers who say they know that their 
students use AI (e.g., ChatGPT) in and for learning is extremely low at 8%. Slightly higher, 
but still low, is the percentage of teachers who say their students do not use AI in lessons 
(12%). The highest proportion believe that only some students use AI in and for lessons 
(44%), or many do not know (36%). As the teachers’ responses show, the use of AI by 
students is very open-ended and left up to individuals, their preferences and needs. How 
they use AI, for what purposes, or whether they use it critically and constructively enough, or 
only reproductively and problematically from the point of view of authorship and knowledge 
acquisition, are questions that will need to be answered in the future, and the systemic basis 
for doing so will also need to be prepared.
Table 5. The t-test for differences between primary and upper secondary school teachers on 
whether their students use AI for learning (e.g., ChatGPT).
Variable Numerus Mean St. 
Deviation
Levene test t-test
N M SD F P t
(df)
P
Do your 
students 
use artificial 
intelligence (e.g. 
ChatGPT) for 
learning?
Primary 
school 
teachers
47 3.06 0.87
0.16 0.69 -0.48(99) 0.31Upper secondary 
school 
teachers
54 3.15 0.88
The analysis of the data on the differences between secondary and primary school teachers’ 
knowledge of student use of AI showed that there were no statistically significant differences 
8
12
36
44 
Yes No Some Don't know
Figure 5. Percentage (f%) of teachers’ responses on whether their students use AI for learning 
(e.g., ChatGPT).
128 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ...
(t (99) = -0.48; p = .31). These findings are surprising, as we expected secondary school 
teachers to be more familiar with student use of AI than primary school teachers. Although 
the mean values show that secondary school teachers are slightly more familiar with it, the 
differences between them and primary school teachers are not significant. 
5 Conclusion
In this paper, starting from the case of humanoid robot teachers and the ubiquity of AI in our 
lives and schools, we discuss the conceptual framework of AI, including its concept, evolution and 
changes that are reflected in the field of education. This theoretical background was illuminated 
by empirical data on the perceptions of foreign language teachers (English and German) of AI in 
school, particularly in foreign language learning and teaching, and by empirical data on differences 
between perceptions of attitudes towards AI according to the teacher’s professional profile.
AI, AI-powered tools, and humanoid robots are posing major challenges for schools, teachers, 
students, and decision-makers. Given their capabilities, their rapid growth, and the disruptive 
changes they bring, AI seems to have become a permanent part of the education landscape. In 
addition to the development of AI, there is an intense debate at the discursive level, such as 
definitions of AI, analyses of the developmental phases of AI, meta-studies on AI research (in 
schools), and several studies that address the technological, social, psychological, anthropological, 
and philosophical dimensions of the impact of AI on humans.
Foreign language teaching has always been supported by various media, and AI is another one that 
is profoundly shaping and changing foreign language teaching. AI supports the user in solving 
linguistic and non-linguistic problems efficiently, quickly, and often too “elegantly.” We sought 
to shed empirical light on all these theoretical orientations, assumptions, and experiences with AI 
in school from the perspective of the direct actors, i.e., foreign language teachers of English and 
German. The results of the study, despite the limitations we have identified, provide an illustration 
of and orientation for further work and research. 
The findings reveal diverse perspectives among teachers regarding the role of AI in education. 
The majority believe that AI will significantly influence teaching in the future. German language 
teachers tend to express stronger opinions than English teachers, although both groups appear 
open to integrating AI in educational settings. Slightly more English teachers perceive AI as 
having potential, while a higher percentage of German teachers view it as a potential threat; 
however, these differences are not statistically significant. Despite some reservations, both groups 
demonstrate cautious optimism, viewing AI as a supportive tool rather than a replacement for 
human educators. The prevailing view is that AI will not replace teachers but can enhance teaching 
practices when implemented effectively. The study did not find substantial differences between 
English and German teachers in how they perceive AI’s potential to improve instruction. German 
teachers were more likely to believe AI could not fully replace human teaching, though this 
should be interpreted with caution, given the standard deviation observed in the data. Teachers’ 
understanding of students’ use of AI remains limited. Many are unsure whether students are 
using AI at all. There were no statistically significant differences between primary and secondary 
teachers regarding this awareness.
Theory and empirical data support the view that 1) AI should be seen as an effective tool, as an 
assistant that can optimise foreign language learning and teaching where we have all perceived 
gaps, e.g., individualised learning, differentiation, motivation to learn by timely feedback, and 
129ENGLISH LANGUAGE AND LITERATURE TEACHING
above all support for the teacher in time-consuming, administrative tasks; and 2) that all the above 
theoretical background, research and empirical data (this research and Vodafone 2023) show that 
the role of the teacher in school, in foreign language learning, is stable, that AI currently does not 
pose a threat as a substitute teacher for either English or German. 
When considering AI in schools and foreign language teaching, we must acknowledge and 
address diametrically opposed yet legitimate perspectives from both theoretical and empirical 
standpoints. Chomsky warns against using AI, succinctly describing it as “sophisticated high-tech 
plagiarism” (YouTube Chomsky 2024). The slightly younger author Hartmann, an expert and 
researcher on AI in foreign language teaching, draws a parallel to the German Emperor Wilhelm 
II, who in the early days of the automobile was convinced that it was a passing phenomenon 
and believed in the horse (Hartmann 2023). Connecting these viewpoints, we can concur with 
Chomsky’s assessment of AI as sophisticated plagiarism. However, we must also recognise the 
validity of both Emperor Wilhelm’s scepticism about technological advancement and Hartmann’s 
assertion that AI’s disruptive influence on schools, learning, and foreign language instruction is 
here to stay.
References
Baskara, Risang, and Mukarto Mukarto. 2023. “Exploring the implications of ChatGPT for language 
learning in higher education.” Indonesian Journal of English Language Teaching and Applied Linguistics 
7 (2): 343–58. https://ijeltal.org/index.php/ijeltal/article/view/1387.
Bouras, Sana. 2024. “AI and the bad teacher dilemma.” Journal of Science and Knowledge Horizons 4 (1): 
39–57.
Bregant, Janez, Boris Aberšek, and Bojan Borstner. 2022. Contemporary Perspectives of Society: Artificial 
intelligence at the interface of science. Univerzitetna založba. 
Burow, Olaf-Axel. 2022. Schule der Zukunft: Sieben Handlungsoptionen. Schule leiten. Beltz.
Chan, Cecilia Ka Yuk, and Louisa H. Y. Tsi. 2024. “Will generative AI replace teachers in higher 
education? A study of teacher and student perceptions.” Studies in Educational Evaluation 83: 101395. 
https://doi.org/10.1016/j.stueduc.2024.101395.
Chomsky, Noam. 2024 “Noam Chomsky on artificial intelligence, ChatGPT.” Through Conversations 
Podcast. Video, 5 min., 37 sec. https://www.youtube.com/watch?v=_04Eus6sjV4
Clarke, Laurie. 2023. “Alarmed tech leaders call for AI research pause.” Science, April 11. https://www.
science.org/content/article/alarmed-tech-leaders-call-ai-research-pause.
Dargan, James. 2019. “Artificial intelligence: The angel of death for foreign language teachers.” Medium, 
April 29. https://chatbotslife.com/artificial-intelligence-the-angel-of-death-forforeign-language-
teachers-cbff644a4967.
De Florio-Hansen, Inez. 2020. Digitalisierung, Künstliche Intelligenz und Robotik: Eine Einführung für 
Schule und Unterricht. utb.
Dolenc, Kosta, and Mihaela Brumen. 2024. “Exploring social and computer science students’ perceptions 
of AI integration in (foreign) language instruction.” Computers and Education: Artificial Intelligence 7: 
1–13 https://doi.org/10.1016/j.caeai.2024.100285.
Du, Jinming, and Ben Kei Daniel. 2024. “Transforming language education: A systematic review of AI-
powered chatbots for English as a foreign language speaking practice.” Computers and Education. 
Artificial Intelligence 6: 100230. https://doi.org/10.1016/j.caeai.2024.100230.
European Parliament. 2023. “What is artificial intelligence and how is it used?” Topics, European 
Parliament, June 20. https://www.europarl.europa.eu/topics/en/article/20200827STO85804/what-is-
artificial-intelligence-and-how-is-it-used.
Eurydice. 2021/2022. “Vzgoja in izobraževanje v Sloveniji.” https://eurydice.sio.si/publikacije/Vzgoja-in-
izobrazevanje-v-RS-2021-22.pdf.
130 Saša Jazbec, Bernarda Leva, Marta Licardo AI Is Here to Stay: An Empirical Study of Attitudes Among Teachers ...
Hartmann, Daniela. 2021. “Künstliche Intelligenz im DaF-Unterricht? Disruptive Technologien als 
Herausforderung und Chance.” Informationen Deutsch als Fremdsprache 48 (6): 683–96.  
https://doi.org/10.1515/infodaf-2021-0078.
—. 2023. “Ersetzt die KI das Schreiben? ChatGPT & Co im DaF-Unterricht.” Cornelsen 
Fortbildungsveranstaltung, June 6, online from 16:00 to 16:45.
Hong, Wilson Cheong Hin. 2023. “The impact of ChatGPT on foreign language teaching and learning: 
Opportunities in education and research.” Journal of Educational Technology and Innovation 5 (1): 37–45.
Jazbec, Saša. 2024. “Umetna inteligenca oziroma orodja, podprta z umetno inteligenco, pri pouku in za 
pouk tujih jezikov: empirična raziskava o stališčih učiteljev tujega jezika v Sloveniji.” Ars & Humanitas 
18 (1): 115–30. https://doi.org/10.4312/ars.18.1.115-130.  
Kačič, Zdravko. 2024. “Kako inteligentna je umetna inteligenca?” Delo, Sobotna priloga, January 13. 
https://www.delo.si/sobotna-priloga/kako-inteligentna-je-umetna-inteligenca.
Kannan, Jaya, and Pilar Munday. 2018. “New trends in second language learning and teaching through the 
lens of ICT, networked learning, and artificial intelligence.” Círculo de Lingüística 150 Aplicada a la 
Comunicación 76: 13–30. https://doi.org/10.5209/CLAC.62495.
Kartal, Galip. 2023. “Contemporary language teaching and learning with ChatGPT.” Contemporary 
Research in Language and Linguistics 1 (1): 59–70. https://doi.org/10.62601/crll.v1i1.10
Kasneci, Enkelejda, Kathrin Sessler, Stefan Küchemann, Maria Bannert, Daryna Dementieva, Frank 
Fischer, Urs Gasser, Georg Groh, Stephan Günnemann, Eyke Hüllermeier, Stephan Krusche, Gitta 
Kutyniok, Tilman Michaeli, Claudia Nerdel, Jürgen Pfeffer, Oleksandra Poquet, Michael Sailer, 
Albrecht Schmidt, Tina Seidel, Matthias Stadler, Jochen Weller, Jochen Kuhn, and Gjergji Kasneci. 
2023. “ChatGPT for good? On opportunities and challenges of large language models for education.” 
Learning and Individual Differences 103: 102274. https://doi.org/10.1016/j.lindif.2023.102274.
Knaus, Thomas. 2024. “Künstliche Intelligenz und Pädagogik – ein Plädoyer für eine 
Perspektiverweiterung.” Ludwigsburger Beiträge zur Medienpädagogik – LBzM 24: 1–34.  
https://doi.org/10.21240/lbzm/24/11.
Licardo, Marta, and Alenka Lipovec, eds. 2024. Artificial Intelligence Literacy and Social-emotional Skills as 
Transversal Competencies in Education. Verlag Dr. Kovač. 
Marr, Bernard. 2018. “The key definitions of artificial intelligence (AI) that explain its importance.” Forbes, 
February 14. https://www.forbes.com/sites/bernardmarr/2018/02/14/the-key-definitions-of-artificial-
intelligence-ai-that-explain-its-importance/.
Miao, Fengchun, Wayne Holmes, Huang Ronghuai, and Hui Zhang. 2021. AI and Education: Guidance 
for policymakers. UNESCO Publishing. 
OECD. AI Policy Observatory. n.d. “OECD AI Principles overview.” Archived July 2, 2023.  
https://oecd.ai/en/ai-principles.
Pettersson, Jenny, Elias Hult, Tim Eriksson, and Tosin Adewumi. 2024. “Generative AI and teachers – for 
us or against us? A case study.” arXiv:2404.03486. https://doi.org/10.48550/arXiv.2404.03486.
Ramge, Thomas. 2018. Mensch und Maschine. Wie künstliche Intelligenz und Roboter unser Leben verändern. 
Reclam.
Renz, André, Swathi Krishnaraja, and Elisa Gronau. 2020. “Demystification of artificial intelligence in 
education. How much AI is really in the educational technology?” International Journal of Learning 
Analytics and Artificial Intelligence for Education 2 (1): 14–30. https:// doi.org/10.3991/ijai.
v2i1.12675.
Rubanau, Ihar 2024. “Artificial intelligent seasons.” IIoT World, June 21. https://www.iiot-world.com/
artificial-intelligence-ml/artificial-intelligence/artificial-intelligent-seasons/.
Strasser, Thomas. 2020. “Künstliche Intelligenz im Sprachunterricht. Ein Überblick.” Revista Lengua y 
Thorp, H. Holden. 2023. “ChatGPT is fun, but not an author.” Science 379 (6630): 313.  
https://doi.org/10.1126/science.adg7879. Cultura. Biannual Publication 1 (2): 1–6. https://dialnet.
unirioja.es/servlet/articulo?codigo=9114327.
Tica, Lena, and Ivana Krsmanović. 2024. “Overcoming the writer’s block? Exploring students’ motivation 
and perspectives on using ChatGPT as a writing assistance tool in ESP.” ELOPE: English Language 
Overseas Perspectives and Enquiries 21 (1): 129–49. https://doi.org/10.4312/elope.21.1.129-149. 
131ENGLISH LANGUAGE AND LITERATURE TEACHING
UM:NIK. 2025. https://plus.cobiss.net/cobiss/um/sl/bib/search.
Vodafone. 2023. “Aufbruch ins Unbekannte.” Vodafone Stiftung, April 20. https://www.vodafone-stiftung.
de/ki-an-schulen/.
Wong, K. Gary, Xiaojuan Ma, Pierre Dillenbourg, and John Huan. 2020. “Broadening artificial 
intelligence education in K-12: Where to start?” ACM Inroads 11 (1): 20–29.  
https://doi.org/10.1145/3381884.
Yuan, Yijia. 2023. “An empirical study of the efficacy of AI chatbots for English as a foreign language 
learning in primary education.” Interactive Learning Environments 32 (10): 6774–89.  
https://doi.org/10.1080/10494820.2023.2282112. 
Žerovnik, Alenka, and Matej Zapušek. 2024. Uporaba generativne umetne inteligence v izobraževanju. 
Založba UL Pedagoške fakultete. https://zalozba.pef.uni-lj.si/index.php/zalozba/catalog/book/226.

133ENGLISH LANGUAGE AND LITERATURE TEACHING
Attitudes of Primary and Secondary EFL Teachers in 
Croatia Towards the Use of AI in Classroom Settings
ABSTRACT
The use of artificial intelligence (AI) in language learning has rapidly increased with the 
widespread popularity of generative AI tools such as ChatGPT. Research highlights the need 
for school-age learners to develop digital literacy skills to engage critically and responsibly 
with AI-based tools. Equally important is the role of (language) teachers, who must possess 
the skills necessary to guide students in navigating and leveraging this technology effectively. 
This exploratory study investigates the extent of EFL teachers’ knowledge and their attitudes 
toward using AI tools for language learning. Focusing on primary and secondary school EFL 
teachers in Croatia, the study aims to shed light on their perspectives on and preparedness 
for integration of AI into the language classroom, addressing a critical aspect of modern 
education and contributing to a deeper understanding of what educators need to successfully 
incorporate AI into their teaching.
Keywords: artificial intelligence (AI), teacher attitudes, EFL, digital competence, teaching 
methods, primary school, secondary school
Pogled osnovno- in srednješolskih učiteljev in učiteljic angleščine 
kot tujega jezika na Hrvaškem na uporabo UI pri pouku
IZVLEČEK
Uporaba umetne inteligence (UI) pri učenju jezikov je močno narasla z razmahom generativnih 
orodij UI, kot je ChatGPT. Raziskave poudarjajo potrebo po digitalni pismenosti učencev 
in učenk za kritično in odgovorno uporabo orodij UI. Prav tako je pri tem ključna vloga 
učiteljev in učiteljic (jezikov), ki morajo imeti ustrezna znanja za učinkovito usmerjanje 
učečih se pri uporabi te tehnologije. Ta študija ugotavlja raven znanja in stališča učiteljev 
in učiteljic angleščine kot tujega jezika (EFL) do uporabe orodij UI pri jezikovnem pouku. 
Raziskava osvetljuje stališča in pripravljenost učiteljev in učiteljic osnovnih in srednjih šol na 
Hrvaškem za vključevanje UI v jezikovni pouk ter prispeva k razumevanju njihovih potreb za 
uspešno integracijo UI v učni proces.
Ključne besede: umetna inteligenca (UI), stališča učiteljev in učiteljic, angleščina kot tuji 
jezik, digitalne kompetence, učne metode, osnovna šola, srednja šola
2025, Vol. 22 (1), 133-150(228)
journals.uni-lj.si/elope
https://doi.org/10.4312/elope.22.1.133-150
UDC: [811.111’243:37.091.3(497.5)]:004.8
Bojan Prosenjak
University of Zagreb, Croatia
Eva Jakupčević
University of Split, Croatia
134 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ...
1 Introduction
AI-based tools have gained traction in all areas of life in the last few years, with education 
and language learning being no exceptions. The ever-changing digital landscape of the 
21st century has brought about a shift, necessitating a redefinition of both the roles of 
stakeholders as well as methodologies in education (Aghaziarati, Nejatifar, and Abedi 
2023, 35). It has become crucial for teachers to possess knowledge and understanding to 
effectively assess and implement these tools in their classes (Luckin et al. 2022, 2). Research 
has highlighted the importance of developing digital literacy skills in school-age learners 
to ensure their critical and responsible use of AI-based tools, which means that teachers 
need to have skills adequate to support and guide students in their exploration of this 
technology (Gisbert Cervera and Caena 2023; Javier and Moorhouse 2023). However, 
studies have indicated that teachers might not possess these skills and may even harbour 
negative perceptions related to AI (Kohnke, Moorhouse, and Zou 2023b; Nazaretsky et 
al. 2022, 916). 
AI-powered tools offer multiple opportunities for language learning and teaching, which 
may be particularly advantageous in EFL contexts, where students usually have limited 
access to authentic language use outside the classroom. Among other benefits, AI-based 
applications can provide learning feedback, help with translation, aid teachers in creating 
activities and scenarios for language learning, and support language assessment (Creely 
2024, 158). However, despite the increasingly recognised potential of AI for language 
learning, its effective integration in education depends on teacher attitudes towards the 
technology, which influence both their methods and student outcomes (Yue, Jong, and Ng 
2024, 19510). Therefore, investigating teachers’ perceptions of AI tool integration in EFL 
education across diverse settings is both pertinent and timely. Notably, there is a lack of 
research on this topic in the context of Croatian primary and secondary schools, a gap that 
the present study seeks to address. 
2 Theoretical Background
As AI becomes more widespread in education, there has been a growing body of research 
examining the advantages and potential drawbacks of using AI-driven tools in language 
learning (e.g., Chiu et al. 2023; Creely 2024; Javier and Moorhouse 2023; Kohnke, 
Moorhouse, and Zou 2023b; Rebolledo Font De la Vall and González Araya 2023). AI-
powered resources for language learning may include language tutoring systems that give 
personalised feedback; generative AI tools that can generate text or activities adapted to 
specific levels or groups of students; text-to-speech software; image creation software, etc. 
(Rebolledo Font De la Vall and González Araya 2023, 7569). Studies indicate that there 
is potential in these tools for improving language learning results (Liang et al. 2023). For 
example, AI-powered tools can offer a space for interaction for EFL learners, who often 
have limited opportunities to use the target language in their everyday lives, and chatbots 
have been found effective in improving students’ oral proficiency in English as well as their 
willingness to communicate in the L2 (Timpe-Laughlin, Sydorenko, and Daurio 2022; 
Yuan 2023). Personalisation has been seen as another benefit of AI for language education, 
with AI-based assessment and feedback tools potentially offering a more adaptive and 
Bojan Prosenjak, Eva Jakupčević
Attitudes of Primary 
and Secondary EFL 
Teachers in Croatia 
Towards the Use of 
AI ...
135ENGLISH LANGUAGE AND LITERATURE TEACHING
targeted learning experience (Yesilyurt 2023). In addition to these benefits for learners, 
AI can support teachers in enhancing their teaching capabilities, developing adaptive 
strategies, and advancing their professional development (Chiu et al. 2023).
On the other hand, numerous challenges associated with the use of AI, both in education 
broadly and in language learning specifically, have been identified, such as ethical 
concerns, for example, those related to authorship of content generated by AI, or the 
lack of transparency in its use (Creely 2024). Data privacy and bias have also frequently 
been mentioned in relation to the copious amounts of data analysed by AI-based systems 
(Kohnke, Moorhouse, and Zou 2023b). Other concerns include potential overreliance on 
technology in assessment, which could result in the loss of “nuanced, empathetic feedback 
and the vital interpersonal connection between educators and learners” (Yesilyurt 2023, 
33). Moreover, relying on AI tools to track learning progress could influence students’ 
capacity to build self-regulation skills, which are essential for lifelong learning (Molenaar 
2022). The appropriate implementation of AI-based tools will therefore require language 
teachers to possess a specific set of skills and knowledge about both the potential advantages 
and limitations of AI (Kohnke, Moorhouse, and Zou 2023a), as the teachers’ lack of 
knowledge can hinder students’ development of digital competence (Nascimbeni and 
Vosloo 2019). The importance of the teachers’ perspectives is further emphasised in a study 
by Polak, Schiavo and Zancanaro (2022), which involved teachers, school psychologists, 
and education managers from schools across four European countries. The study found 
that a strong willingness to learn about AI and incorporate digital tools into teaching is 
crucial for the effective integration of AI in education.
While some studies mention negative attitudes towards AI among teachers, with reports 
of anxiety and concerns about the future (Chiu et al. 2023, 12), the majority indicate 
that teachers recognise both the advantages and limitations of using AI in education. For 
instance, teachers (N = 28) from diverse backgrounds in a study by Aghaziarati, Nejatifar 
and Abedi (2023, 39), acknowledged AI’s potential to improve individualised learning and 
encourage innovative teaching methods, but they also raised issues regarding ethics as well 
as the need for infrastructure and ongoing training for teachers. Similarly, a small-scale 
study of twelve university language instructors in Hong Kong by Kohnke, Moorhouse 
and Zou (2023b) found that while participants were overall optimistic about the potential 
of AI-powered tools in language education, they lacked confidence in teaching students 
to use these tools productively and responsibly. They also voiced concerns regarding the 
ethical and practical difficulties linked to the adoption of AI technologies. Comparable 
results have been reported in other studies, where teachers view AI positively but regularly 
acknowledge their limited knowledge of its practical applications (Chounta et al. 2022; 
Galindo-Domínguez et al. 2024b; Polak, Schiavo, and Zancanaro 2022; Sütçü and Sütçü 
2023). These findings underscore the widespread need for systematic teacher training, 
which should start at the pre-service level. For instance, while expressing optimism about 
the effect of AI on education and teaching and learning EFL, Slovak pre-service English 
language teachers (N = 137) reported having no (61.31%) or limited (21.17%) knowledge 
of the fundamental principles of AI (Pokrivcakova 2023). Only 35.04% of the participants 
136 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ...
believed their understanding of AI-based tools for EFL teaching was sufficient, whereas 
most (64.24%) supported the idea of including AI education in their university studies.
Research also reveals that an insufficient understanding of how AI technologies operate may 
limit the teachers’ ability to fully integrate them into learning, teaching, and assessment 
(Chiu et al. 2023, 12). For example, a study by Chounta et al. (2022) validated the perceived 
knowledge of 131 Estonian primary and secondary teachers about AI, using a questionnaire 
with statements related to the technology. Most participants (57%) answered 60% of the 
questions correctly, and fewer correct answers were provided by teachers who considered 
themselves more knowledgeable, suggesting that misconceptions may be a hindrance to 
the implementation of AI. This is further supported by research conducted by Galindo-
Domínguez et al. (2024b) on 445 Spanish teachers from diverse backgrounds, which found 
a positive relationship between teachers’ digital competence and their perceptions of AI. 
According to this study, teachers with greater digital competence tend to experience fewer 
difficulties when using educational technology, which in turn fosters a more positive attitude 
towards its integration. These findings once again accentuate the importance of providing 
teachers with training in both general digital competences and AI-related issues to enhance 
their confidence and effectiveness in using these tools. However, it must be emphasised that 
a low level of digital competence can be offset by a high level of motivation to learn about AI 
tools, which has been found to be among the key factors for incorporating AI into education 
(Polak, Schiavo, and Zancanaro 2022).
Many teachers across different educational contexts express uncertainty about how to handle 
AI-related issues in education, underscoring the need for further research to determine the 
specific support required to help them navigate these challenges. While numerous studies have 
explored teachers’ attitudes towards AI in education in general, there is a scarcity of research 
focusing specifically on foreign language teachers working with primary and secondary school 
learners. As far as we are aware, no studies of this kind have been carried out in the Croatian 
context. This gap in the literature highlights the importance of the present study, which aims 
to explore the attitudes of primary and secondary school EFL teachers in Croatia towards AI 
use in education, as well as their perceptions of the potential benefits and challenges that this 
technology presents in the field of foreign language teaching and learning.
3 The Present Study
3.1 Aim and Research Questions
The aim of the present study was to investigate the attitudes of primary and secondary school 
EFL teachers in Croatia towards the use of AI in language teaching and learning, as well as 
to explore their perspectives on the possible advantages and disadvantages brought by the 
implementation of this technology in the EFL classroom. To this end, the following research 
questions have been formulated:
RQ1: What are the attitudes of primary and secondary school EFL teachers in Croatia 
towards the use of AI in EFL teaching and learning?
137ENGLISH LANGUAGE AND LITERATURE TEACHING
RQ2: To what extent do the teachers’ age, length of teaching experience, and type 
of school where they work influence their attitudes towards the use of AI in EFL 
teaching?
RQ3:  What are the perspectives of EFL teachers in Croatia on the potential 
advantages and disadvantages of incorporating AI in their EFL teaching practice?
3.2 Study Context
Students in Croatian schools are required to begin studying a foreign language, typically 
English, from the first grade (pupils aged 6/7). EFL is taught through two weekly lessons 
in lower primary (1st to 4th grade, seventy lessons per year), three weekly lessons in upper 
primary (5th to 8th grade, 105 lessons per year), while the number of weekly lessons in 
secondary school depends on the type of school. To teach EFL in a Croatian school, teachers 
must hold an MA in EFL teaching or primary education with a specialisation in teaching 
English to primary-aged children. 
In recent years, there has been a growing recognition of the importance of artificial intelligence 
(AI) in education in Croatia. Notably, a handbook on AI in education has been published by 
the Agency for Electronic Media and UNICEF (2024), and curricula for elective subjects on 
AI for both primary and secondary schools have been developed by CARNET – the Croatian 
Academic and Research Network (2024a; 2024b). Additionally, a variety of webinars and 
resources have been made available to support teachers in integrating AI into their classrooms. 
However, to our knowledge, there is currently no systematic education about AI in the 
context of English language teaching or teacher education programs in Croatia. 
3.3 Participants
The participants in our study were sixty-three primary and secondary school EFL teachers 
from across Croatia, five of whom were male and the rest female. Most of the participants 
were between 41 and 50 years old (Figure 1), of whom almost half worked in primary and 
half in secondary school (Figure 2).
3.2 Study Context 
Students in Croatian schools are required to begin studying a foreign language, typically 
English, from the first grade. (pupils aged 6/7). EFL is taught through two weekly lessons in 
lower primary (1st to 4th grade, seventy lessons per year), three weekly lessons in upper primary 
(5th to 8th grade, 105 lessons per year), while the number of weekly lessons in secondary school 
depends on the type of school. To teach EFL in a Croatian school, teachers must hold an MA 
in EFL teaching or a primary education with a specialisation in teaching English to primary-
aged children.  
In recent years, there has been a growing recognition of the importance of artificial intelligence 
(AI) in education in Croatia. Notably, a handbook on AI in education has been published by 
the Agency for Electronic Media and UNICEF (2024), and curricula for elective subjects on 
AI for both primary and secondary schools have been developed by CARNET – the Croatian 
Academic and Research Network (2024a; 2024b). Additionally, a variety of webinars and 
resources have been made available to support teachers in integrating AI into their classrooms. 
However, to our knowledge, there is currently no systematic education about AI in the context 
of English language teaching or teacher education programs in Croatia.  
3.3 Participants 
The participants in our study were sixty-three primary and s cond r  school EFL tea ers from 
across Croatia, five of whom were male and the rest female. Most of the participan s were 
between 41 and 50 years old (Figure 1), of whom almost half w rked in primary an half in 
secondary school (Figur  2). 
 
FIGURE 1. Participants by age group (N = 63). Figure 1. Participants by age group (N = 63).
138 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ...
 
FIGURE 2. Participants by type of school of employment (N = 63). 
Almost half the participants had been working in school for more than 20 years, and over a 
third for between 10 and 20 years (Figure 3).  
 
FIGURE 3. Participants by length of teaching experience (N = 63). 
3.4 Instruments 
For the purposes of this study, two questionnaires were used. Questionnaire 1 was developed 
and validated in a study by Galindo-Domínguez et al. (2024b) with the aim of analysing 
teachers’ attitudes towards the use of AI in education (Appendix 1): the final scale consisted 
of 25 items divided into four factors or dimensions: willingness to use AI (items 1-3), attitude 
towards AI (items 4-10), professional expectations towards AI (items 11-20), and personal 
experiences with AI (items 21-25). Questionnaire 1 required the participants to select one of 
the five values on a Likert scale for each of the twenty-five items, indicating the degree to 
which they agreed with the given statement. The higher the value for each item, the stronger 
the participants’ agreement with the statement, reflecting a more positive attitude of EFL 
teachers towards AI use in education. For items Q11 and Q21, the values were recoded before 
the analysis.  
Questionnaire 2 was designed by the authors of the present study, drawing on relevant literature 
and prior studies in the domain of teacher attitudes towards AI (Aghaziarati, Nejatifar, and 
Abedi 2023; Chounta et al. 2022; Kohnke, Moorhouse, and Zou 2023b). Its purpose was to 
gather responses that would support the qualitative analysis of the results and answer the 
Figure 2. Participants by type of school of employment (N = 63).
Almost half the participants had been working in school for more than 20 years, and over a 
third for between 10 and 20 years (Figure 3). 
 
FIGURE 2. Participants by type of school of employment (N = 63). 
Almost half the participants ad been working in school for more than 20 years, and over a 
third for between 10 and 20 years (Figure 3).  
 
FIGURE 3. Participants by length of teaching experience (N = 63). 
3.4 Instruments 
For the purposes of this study, two questionnaires were used. Questionnaire 1 was developed 
and validated in a study by Galindo-Domínguez et al. (2024b) with the aim of analysing 
teachers’ attitudes towards the use of AI in education (Appendix 1): the final scale consisted 
of 25 items divided into four factors or dimensions: willingness to use AI (items 1-3), attitude 
towards AI (items 4-10), professional expectations towards AI (items 11-20), and personal 
experiences with AI (items 21-25). Questionnaire 1 required the participants to select one of 
the five values on a Likert scale for each of the twenty-five items, indicating the degree to 
which they agreed with the given statement. The higher the value for each item, the stronger 
the participants’ agreement with the statement, reflecting a more positive attitude of EFL 
teachers towards AI use in education. For items Q11 and Q21, the values were recoded before 
the analysis.  
Questionnaire 2 was designed by the authors of the present study, drawing on relevant literature 
and prior studies in the domain of teacher attitudes towards AI (Aghaziarati, Nejatifar, and 
Abedi 2023; Chounta et al. 2022; Kohnke, Moorhouse, and Zou 2023b). Its purpose was to 
gather responses that would support the qualitative analysis of the results and answer the 
Figure 3. Participants by length of teaching experience (N = 63).
3.4 Instruments
For the purposes of this study, two questionnaires were used. Questionnaire 1 was developed 
and valida e  in a study by Galindo-Domínguez et al. (2024b) with the aim of analysing 
teachers’ attitudes towards the use of AI in education (Appendix 1): the final scale consisted 
of 25 items divided into four factors or dimensions: willingness to use AI (items 1-3), attitude 
towards AI (items 4-10), professional expectations towards AI (items 11-20), a d personal 
experience  with AI (items 21-25). Questionnaire 1 required the participants to select one 
of the fiv  values on a Likert scale for each of th  twenty-five items, indicating the degree to 
which they agreed with the given statement. The higher the value for each item, the stronger 
the participants’ agreement with the statement, reflecting a more positive attitude of EFL 
teachers towards AI use in education. For items Q11 and Q21, the values were recoded 
before the analysis. 
Questionnaire 2 was designed by the authors of the present study, drawing on relevant 
literature and prior studies in the domain of teacher attitudes towards AI (Aghaziarati, 
Nejatifar, and Abedi 2023; Chounta et al. 2022; Kohnke, Moorhouse, and Zou 2023b). Its 
139ENGLISH LANGUAGE AND LITERATURE TEACHING
purpose was to gather responses that would support the qualitative analysis of the results and 
answer the research questions posed in this study. It included eight open-ended questions 
which the participants were invited to answer but that remained optional (Appendix 2). 
3.5 Data Collection and Analysis
The data for this study was collected in the winter semester of the school year 2024/2025. The 
questionnaires were sent to teachers across Croatia by email and were posted on Facebook 
groups for primary and secondary school EFL teachers in Croatia. The introductory section 
outlined the goal of the study and made it clear that their responses would help in achieving 
this goal. Next, it was stated that they would be taking part in the study on a voluntary basis 
and anonymously, and they had the option to withdraw from participation at any moment. 
Apart from the two questionnaires, the participants’ demographic data was also collected – 
their profession, gender, age, type of school where employed, and the length of their teaching 
experience. 
To address RQ1, the participants’ responses from Questionnaire 1 were analysed quantitatively 
using descriptive statistics. The responses were also grouped into four factors, as identified by 
Galindo-Domínguez et al. (2024b), and the mean values for each factor were calculated. 
Next, a multiple linear regression analysis was performed to answer RQ2. This analysis 
examined the relationship between the participants’ average questionnaire scores and the 
three predictors: age, type of school, and length of teaching experience.
Questionnaire 2 was analysed qualitatively, using a thematic analysis approach. The two 
authors independently reviewed the teachers’ responses and identified recurring themes based 
on frequency and relevance. This was followed by a collaborative process in which findings 
were compared and discussed, reaching a consensus on the key themes that became evident 
from the data. This iterative process ensured that the analysis was thorough and reflective of 
the teachers’ perspectives. Focusing on the themes most frequently mentioned ensured that 
the key insights were captured while maintaining the richness of the data.
3.6 Results and Discussion
To determine the attitudes of EFL teachers in Croatia towards the use of AI in EFL teaching 
and learning (RQ1), the participants’ responses in Questionnaire 1 were analysed. After 
categorizing the responses into four factors as identified by Galindo-Domínguez et al. 
(2024b), we found all four mean values to be above average, although varying across the 
factors (Table 1). The findings indicate that most participants in the present study are willing 
to use AI in their classes (Factor 1) and have a positive attitude towards it (Factor 2). However, 
fewer teachers have positive professional expectations regarding AI (Factor 3), and an even 
smaller number have had positive firsthand experiences with it (Factor 4). 
This pattern mirrors the results obtained by Galindo-Domínguez et al. (2024b) for the four 
factors, with Factor 1 rated the highest, and Factor 4 the lowest by the Spanish teachers in 
their study. However, unlike our participants, teachers in Spain exhibited only moderately 
high or neutral values for Factors 1, 2 and 3 (3.73, 3.60, and 3.33 respectively) and a 
140 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ...
moderately low value for Factor 4 (2.24). In other words, the participants in our study 
expressed greater willingness to use AI in their classes and more positive attitudes towards the 
technology compared to the Spanish teachers. While they reported lower values regarding 
their professional expectations of the technology, these values were still higher than those 
of their counterparts in Galindo-Domínguez’s study. Despite this difference, the results 
highlight the need for providing teachers with training and examples of effective practices for 
using AI in EFL classrooms, which will be discussed in more detail at a later point. 
The results for individual questionnaire items (Table 2) provide more detailed insight 
into the teachers’ attitudes. The highest mean value was recorded for statement 4 (“I am 
interested in learning about artificial intelligence in education.”), followed by statement 5 
(“I am interested in exploring the use of artificial intelligence as a complementary tool for my 
teaching practice.”) and statement 3 (“I would love to be able to use artificial intelligence in 
my work as a teacher.”). These results indicate that most teachers in our study are eager 
to learn about how AI can be implemented in their lessons and how they can use it in 
class. On the other hand, statement 23 showed the lowest mean value (“I have extensive 
experience with the use of artificial intelligence in education.”), followed by statements 24 (“I 
can share my knowledge and skills about artificial intelligence with other teachers.”), and 13 
(“Artificial intelligence will positively revolutionise education.”). These findings suggest that 
many teachers lack sufficient experience with AI in their classrooms, limiting their ability 
to share their expertise with colleagues. Additionally, a sizeable number of participants 
expressed scepticism about AI’s potential to revolutionise education. 
The results of our quantitative analysis are largely in line with previous studies, where teachers 
reported generally positive attitudes towards the use of AI in education but also noted a lack 
of the skills necessary to implement AI tools successfully in their teaching (e.g., Chounta et 
al. 2022; Galindo-Domínguez et al. 2024a, 20204b; Pokrivcakova 2023; Polak, Schiavo, 
and Zancanaro 2022; Sütçü and Sütçü 2023). However, as opposed to teachers in some 
studies, for example, the pre-service teachers in Pokrivcakova (2023), these participants 
were not entirely optimistic about AI’s potential to revolutionise education. Their cautiously 
positive stance is further elaborated on in their responses to Questionnaire 2 and reflected 
in other studies, such as Sütçü and Sütçü (2023), where Turkish EFL teachers of university 
preparatory classes also showed awareness of the advantages of AI along with concerns 
regarding its potential adverse effects on education. The teachers in the study by Kohnke, 
Moorhouse and Zou (2023b) also expressed caution and emphasised the overall lack of 
structured training and consistent information from their institution. These results further 
underscore the importance of teacher training, not only in AI-based tools but also in general 
Table 1. Mean values of participants’ answers to Questionnaire 1 per factor averages (N = 63).
Factor Minimum Maximum M SD
1 1.00 5.00 4.17 1.028
2 1.71 5.00 4.09 0.895
3 1.60 5.00 3.71 0.831
4 1.00 4.80 3.27 1.030
141ENGLISH LANGUAGE AND LITERATURE TEACHING
digital competences, as highlighted by Kohnke, Moorhouse and Zou (2023a), since studies 
have shown that attitudes towards and implementation of AI in the classroom depend on the 
teachers’ confidence in using such tools (Galindo-Domínguez 2024a).
The results of the regression analysis conducted to examine the extent to which the teachers’ 
age, length of teaching experience, and type of school where they work influence their 
attitudes towards AI use in EFL teaching (RQ2) indicate that the overall model was not 
statistically significant (F(3,59) = 1.229, p < .307). This suggests that the type of school, 
teachers’ age, and the length of their teaching experience do not significantly explain the 
variance in the average teacher score. Therefore, the answers given in the questionnaires by 
all teachers who took part in the present study could be treated equally, regardless of how 
old they were or how long they had been working in either a primary or secondary school. 
These results reflect those from other studies. For example, in Galindo-Domínguez et al. 
Table 2. Mean values of participants’ answers to Questionnaire 1 (N = 63).
Question Minimum Maximum M SD
Q1 1 5 4.11 1.049
Q2 1 5 4.13 1.114
Q3 1 5 4.27 1.050
Q4 1 5 4.44 0.894
Q5 1 5 4.35 1.003
Q6 1 5 3.65 1.246
Q7 1 5 4.02 1.085
Q8 2 5 4.17 0.959
Q9 1 5 3.87 1.085
Q10 1 5 4.10 0.979
Q11 1 5 4.03 1.047
Q12 1 5 3.17 1.100
Q13 1 5 3.11 1.094
Q14 1 5 3.67 1.092
Q15 1 5 3.86 0.913
Q16 1 5 4.00 1.000
Q17 1 5 3.87 0.959
Q18 2 5 4.13 0.852
Q19 1 5 3.37 1.005
Q20 1 5 3.87 0.924
Q21 1 5 4.06 1.268
Q22 1 5 3.52 1.105
Q23 1 5 2.54 1.162
Q24 1 5 2.70 1.328
Q25 1 5 3.54 1.330
142 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ...
(2024a, 2024b), teachers’ attitudes towards AI did not differ based on the educational stage 
in which they were employed. However, when it comes to technology acceptance in general, 
results were more varied. Some studies have pointed out that younger teachers are more open 
to incorporating technology in their practice (e. g., O’Bannon and Thomas 2014; Joseph, 
Thomas, and Nero 2021), but this trend was not observed in our sample. Joseph, Thomas 
and Nero (2021) also found that more experienced teachers tended to use less technology in 
the classroom, a finding that contrasts with other studies, such as that conducted by Gu, Zhu 
and Guo (2013), who found novice teachers to be less reliant on technology. In other words, 
it appears that studies to date have found few consistent patterns regarding the influence of 
personal and sociodemographic factors on teachers’ attitudes towards technology, which is 
confirmed by our results.
Following the presentation of the quantitative findings, the teachers’ answers to the open 
questions in Questionnaire 2 (Appendix 2) were analysed thematically to provide more 
insight into the perspectives of EFL teachers in Croatia on the potential advantages and 
disadvantages of incorporating AI in their EFL teaching practice (RQ3). The thematic 
analysis of responses from Questionnaire 2 revealed three main categories of benefits: those 
associated with lesson planning and material design, those primarily benefiting students, and 
those related to assessment. 
Of thirty-six participants who answered the question about the potential of implementing AI 
in EFL teaching and learning, twenty mentioned its role in lesson planning and/or designing 
materials, with six highlighting timesaving as a key benefit. The terms ‘personalisation’ 
and ‘adaptation’ frequently appeared, referring to tailoring tasks, content, and programs to 
individual learners’ needs and interests, including those in mixed-ability classes and learners 
with special educational needs. This type of differentiation was also linked to increased student 
motivation and engagement, which corresponds to findings by Brinegar (2023). Six participants 
specifically discussed adapting materials in relation to the existing EFL curriculum, noting 
that AI enables the creation of multiple curriculum versions that can be improved, tailored, 
or individualised for specific students. In addition, AI was considered a fast way to generate 
creative ideas and a useful tool for generating tasks such as reading comprehension exercises 
(e.g., using ChatGPT), materials accompanying videos or audio recordings, interactive and 
visually engaging presentations (e.g., Canva), generating images and flashcards (e.g., from 
YouTube), creating dialogue scenarios, discussion questions, grammar tasks, quizzes, and 
gamified activities. These benefits of AI for lesson planning and creativity have also been 
recognised in previous studies (e.g., Chounta et al. 2022; Sütçü and Sütçü 2023).
Regarding the second category of benefits, fifteen participants highlighted the advantages of 
AI primarily for students, particularly in providing additional help or guidance in learning. 
They noted that engaging, interactive, up-to-date, and diverse activities created with the 
help of AI can foster active student participation, resulting in more dynamic classes where 
students can develop unexpected solutions, which in turn enables exchanging experiences 
and networking among students. Some examples included AI facilitating faster and easier 
access to information, encouraging independent learning, boosting motivation, and 
offering instant feedback. Participants also highlighted AI’s potential to support speaking 
143ENGLISH LANGUAGE AND LITERATURE TEACHING
and pronunciation practice through voice recognition software, to develop critical thinking 
by enabling students to create their own content, and to teach them to craft written and 
spoken prompts. Finally, a participant mentioned AI’s potential for creating chatbots to 
tutor students, support students with learning difficulties, and provide additional assistance 
through assistive technology features. 
The final category of benefits identified by participants was assessment, with six participants 
highlighting it as having the greatest potential for AI integration in their lessons. They 
provided examples such as using AI to create adapted tasks, to design reading and listening 
comprehension tests for specific topics or vocations, and to develop formative and summative 
assessment rubrics (e.g., by using tools such as MagicSchool). AI was also seen as a way 
to make test correction faster, easier, and more precise (e.g., uploading existing rubrics). 
Additionally, participants mentioned using AI to generate revision quizzes (e.g., via Kahoot 
and Quizlet), and two participants specifically highlighted its utility for writing descriptive 
grades and providing feedback for parents.
The three groups of benefits identified in the participants’ answers align closely with those 
discussed in the current research on AI in education. For instance, in a recent systematic 
literature review of ninety-two articles, Chiu et al. (2023) identified four key domains 
where AI is beneficial: AI in student learning, AI in teaching, AI in administration, and AI 
for assessment. These domains mirror the categories found in our participants’ responses, 
suggesting that teachers are familiar with the major advantages AI can offer. The parallels 
between the findings in this study and the broader literature indicate that educators are not 
only aware of AI’s potential but also able to recognise its relevance across various aspects of 
their professional practice. However, it must be noted that most teachers focused on a limited 
set of benefits, primarily those relating to lesson planning and designing learning materials. 
This is unsurprising, since such time-saving methods are likely to resonate with teachers who 
juggle multiple tasks, including administrative duties. Additionally, these uses of generative 
AI are likely the most accessible and straightforward, while fewer teachers might recognise the 
more specific benefits, possibly those who have had additional training.
Analysis of the participants’ responses also highlighted several concerns and challenges, which 
can be grouped into four categories, the first of which encompasses concerns about misuse 
and ethical issues. Unsurprisingly, many of the participants (21/41) focused on issues of 
plagiarism and cheating, mentioning the example of ChatGPT being used by students to draft 
essays and create presentations. Teachers also noted students’ misuse of AI for schoolwork, 
homework, and tests. Two participants even pointed out that teachers could be complicit in 
plagiarism by presenting AI-generated materials as their own.
The second category centres on the potential dangers of overreliance on AI, or of using AI 
tools “without thinking,” particularly regarding its impact on critical thinking and creativity 
(7 participants). Three participants expressed concern that students’ need to invest little effort 
and time in some tasks nowadays could erode key skills, for example, research skills, problem-
solving, critical thinking, summarising, writing essays, creating presentations, spelling, and 
translating. Some teachers noted that the ease of AI-generated tasks might create a false sense 
of achievement, further diminishing motivation and creativity. In this context, several of the 
144 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ...
teachers commented that AI should be applied “in the right measure.” While many teachers 
acknowledged AI as a potential tool for improving motivation, several expressed concern that 
overreliance on AI tools would reduce students’ overall motivation and creativity. One teacher 
even stated that reliance on AI would bring about a reduction in people’s intelligence because 
“we will not bother to do things anymore,” and another participant said the following: “I do 
not like the direction we’re heading in. We are losing our humanity.” 
The third category addresses the limitations and risks of AI tools, as well as concerns about 
digital literacy. Five participants mentioned the limitations of AI, such as age restrictions, bias, 
and the inability to always interpret context or provide accurate feedback, which can result in 
student frustration and even data privacy infringement. Several teachers (5/41) emphasised 
the need to use reliable AI tools, especially when doing online tasks and tests, and to educate 
students on what happens to their data online. A few participants noted that the materials 
generated by AI needed adaptation, as they can be an excellent starting point but still require 
careful review before use in class. Furthermore, four participants raised concerns about low 
digital literacy among both students and teachers, stressing the necessity for better education 
and preparation to use AI effectively in the classroom. They also highlighted the considerable 
time needed to teach students how to learn using AI tools, with teachers often lacking 
training themselves. Additionally, two participants underscored the absence of instruction on 
the ethical considerations regarding the use of AI.
The final category involves the broader social and professional implications of AI in education. 
Concerns were raised about the impact of AI on the teaching profession, with four participants 
expressing fear that AI could diminish the teacher’s role, particularly in assessment, where 
overreliance on AI could reduce insight into students’ individual needs. One participant also 
mentioned the reduction of the role of the teacher due to overreliance on AI tools, and 
ultimately teachers’ fear of losing work. Two participants pointed out the potential social 
alienation that could result from increased automation, stressing that AI cannot replace the 
vital emotional and social support provided by teachers. Additionally, the digital divide was 
identified as an issue, with some schools lacking the technology necessary to integrate AI 
effectively, leading to unequal educational opportunities. Luckin (2017, 3) also notes that “[t]
he less able and poorer students in society are generally least well served by education.” This 
disparity could further marginalise some students or even entire schools, as one participant 
in the present study mentioned. Finally, two participants made notable comments, with one 
stating, “I really don’t want to solve problems that could be prevented simply by not using 
AI,” and another remarking, “I think less damage would be done by getting rid of AI than 
working with it. But I do understand that what’s done is done, it’s here now, and the role of 
the teacher with time will be not to make humanity smarter through teaching, but to do their 
best to slow down the process of getting stupider.”
The results of the qualitative analysis reveal that, in discussing the challenges presented by 
the integration of AI into their practice, most teachers in our study focused on ethical issues, 
particularly cheating. This is not surprising, as we can assume it is a concern they face daily. 
For example, secondary school teachers mentioned the issue of preparing students for the 
matura (school-leaving) exam, which includes an essay-writing task in its EFL component. 
145ENGLISH LANGUAGE AND LITERATURE TEACHING
Some teachers reported that students used generative AI to write their essays, leaving them 
unprepared for the exam. To address such issues, assessment practices will need to be ‘AI-
proofed’ and better aligned with current trends and the needs of students in the 21st century. 
However, these changes need to start at the top. It is not enough for teachers to have the 
necessary skills; they also need support through curricula and more appropriate school-
leaving exams. 
In terms of other challenges related to AI, which are frequently discussed in the literature, only 
a few teachers in our sample mentioned any. Examples include the widening digital divide 
(Luckin 2017; Chiu et al. 2023) and data privacy issues (Luckin 2017; Kohnke, Moorhouse, 
and Zou 2023b). This suggests that some teachers may lack in-depth knowledge of these 
critical issues, which should be addressed through appropriate professional development 
programs. Indeed, many teachers stressed the need for training, both for themselves and for 
their students, a concern that is frequently highlighted in other studies (Chiu et al. 2023).
4 Conclusion
The results of our study reveal an overall positive outlook on the use of AI in EFL teaching and 
learning, with teachers expressing their readiness to integrate AI into their classes. However, 
their professional expectation and firsthand experiences with AI were less positive. This is 
not unexpected, as educational technology has often sparked controversy among educators 
(Wegerif and Major 2023), especially when seen as a threat to students’ learning agency 
(Han et al. 2024). These concerns underscore the importance of integrating AI thoughtfully, 
in ways that support, rather than diminish the vital role of teachers in promoting student 
autonomy and engagement.
Further analysis revealed that these teachers lack experience with AI in education and are 
unable to share their skills with peers. This finding is in line with previous studies (e.g., 
Chounta et al. 2022; Pokrivcakova 2023; Polak, Schiavo, and Zancanaro 2022; Sütçü and 
Sütçü 2023), where teachers reported having limited knowledge of how to implement AI in 
their teaching. Given that AI-based tools have only recently become more widely recognised, 
this lack of familiarity is not surprising. While the teachers were generally positive about AI, 
they remained cautious about its potential to revolutionise education. This caution was echoed 
in the qualitative part of the study, where several teachers emphasised the importance of 
exercising restraint in integrating AI, ensuring it does not overwhelm the educational process.
Additionally, no significant connection was found between teachers’ attitudes and the type 
of school, their age, and the length of experience in our sample, suggesting that AI training 
should be accessible to all teachers, regardless of their background. Both primary and 
secondary school teachers, as well as those with varying levels of experience, displayed similar 
attitudes and concerns.
Our qualitative analysis also revealed that while some teachers demonstrated a nuanced 
awareness of the advantages and disadvantages presented by AI, the majority focused on its 
potential for lesson planning, material generation, and the risk of cheating and plagiarism. 
These results highlight the importance of developing training programs that encompass both 
146 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ...
the technical and ethical considerations related to AI, as pointed out by Kohnke, Moorhouse 
and Zou (2023a). Such training should not only empower teachers to use AI effectively but also 
equip them to instruct students about the responsible use of these technologies. These results 
are particularly relevant for university educators and pre-service teachers, since incorporating 
AI-related topics into teacher education programs is key to preparing future teachers to make 
effective use of AI in their practice and adapt to an evolving educational landscape.
While the present study offers valuable insights, several limitations should be acknowledged. 
First, the limited sample size and its composition may affect the generalisability of the 
findings, especially as the volunteer participants may not accurately reflect the wider teacher 
population in Croatia. The sample included a disproportionate number of female participants, 
reflecting the general gender trend in the teaching profession, but this imbalance may still 
limit the generalisability of the findings. Furthermore, participants were partially recruited via 
Facebook and other social media platforms, which may have led to a bias towards individuals 
already inclined to use technology. Another limitation is that participants were primarily 
drawn from urban and semi-urban areas, and specific data on their regional background was 
not systematically collected. This prevented an in-depth exploration of potential regional 
differences (e.g., centre vs. periphery). These issues could be addressed in future research by 
including a larger and more diverse sample, with a focus on regional variation and exploring 
alternative recruitment methods to reach a broader range of participants.
Additionally, since the data was based on self-reporting, there is further potential for bias. The 
study also focused on a limited set of variables, while other factors, such as prior exposure 
to AI or specific training, might have influenced attitudes. Finally, the study may not have 
fully accounted for variations in access to resources or institutional support, which could 
influence teachers’ willingness to adopt AI in their classrooms. Further studies could build 
on the findings by exploring additional variables, such as available training opportunities, 
and institutional support, which could also have a considerable impact on shaping attitudes. 
Examining how access to technological resources influences teachers’ adoption of AI could 
provide an important context for understanding the barriers to and opportunities for AI 
implementation in schools.
In conclusion, our study underscores the need for practical, consistent teacher training to 
translate the positive attitudes of EFL teachers towards AI into effective classroom practices. 
By addressing the advantages and disadvantages of AI, we can better prepare students with 
the resources and knowledge needed to engage with AI in the 21st century.
References
Agency for Electronic Media and UNICEF. 2024. Umjetna inteligencija u obrazovanju. https://www.medijs 
kapismenost.hr/wp-content/uploads/2024/04/Umjetna-inteligencija-u-obrazovanju.pdf.
Aghaziarati, Ali, Sara Nejatifar, and Ahmad Abedi. 2023. “Artificial intelligence in education: Investigating 
teacher attitudes.” AI and Tech in Behavioral and Social Sciences 1 (1): 35–42. https://doi.org/10.61838 
/kman.aitech.1.1.6.
Brinegar, Merrilee. 2023. “Chatbots as a supplementary language learning tool: Advantages, concerns, and 
implementation.” International Journal of Education and Social Science Research 6 (6): 223–30.  
https:// doi.org/10.37500/IJESSR.2023.6615. 
147ENGLISH LANGUAGE AND LITERATURE TEACHING
CARNET – Croatian Academic and Research Network. 2024a. Kurikulum fakultativnog predmeta za 
srednje škole Umjetna inteligencija: od koncepta do primjene. https://www.carnet.hr/pogledajte-kurikulu 
me-o-umjetnoj-inteligenciji-za-osnovne-i-srednje-skole/.
—. 2024b. Kurikulum izvannastavne aktivnosti za osnovne škole Umjetna inteligencija: od koncepta do 
primjene. https://www.carnet.hr/pogledajte-kurikulume-o-umjetnoj-inteligenciji-za-osnovne-i-srednje 
-skole/.
Chiu, Thomas KF, Qi Xia, Xinyan Zhou, Ching Sing Chai, and Miaoting Cheng. 2023. “Systematic 
literature review on opportunities, challenges, and future research recommendations of artificial 
intelligence in education.” Computers and Education: Artificial Intelligence 4: 100118.  
https://doi.org /10.1016/j.caeai.2022.100118. 
Chounta, Irene-Angelica, Emanuele Bardone, Aet Raudsep, and Margus Pedaste. 2022. “Exploring 
teachers’ perceptions of artificial intelligence as a tool to support their practice in Estonian K-12 
education.” International Journal of Artificial Intelligence in Education 32 (3): 725–55.  
https://doi.org /10.1007/s40593-021-00243-5. 
Creely, Edwin. 2024. “Exploring the role of generative AI in enhancing language learning: Opportunities 
and challenges.” International Journal of Changes in Education 1 (3): 158–67. https://doi.org/10.478 
52/bonviewIJCE42022495.
Galindo-Domínguez, Héctor, Nahia Delgado, Lucía Campo, and Daniel Losada. 2024a. “Relationship 
between teachers’ digital competence and attitudes towards artificial intelligence in education.” 
International Journal of Educational Research 126: 102381. https://doi.org/10.1016/j.ijer.2024.10 2381.
Galindo-Domínguez, Héctor, Martin Sainz de la Maza, Lucía Campo, and Daniel Losada. 2024b. “Design 
and validation of a multidimensional scale for assessing teachers’ perceptions towards artificial 
intelligence in education.” International Journal of Learning Technology (online first). https://doi.org/10 
.1504/ijlt.2023.10062094. 
Gisbert Cervera, Mercè, and Francesca Caena. 2022. “Teachers’ digital competence for global teacher 
education.” European Journal of Teacher Education 45 (4): 451–55. https://doi.org/10.1080/02619768 
.2022.2135855. 
Gu, Xiaoqing, Yuankun Zhu, and Xiaofeng Guo. 2013. “Meeting the ‘digital natives’: Understanding the 
acceptance of technology in classrooms.” Journal of Educational Technology & Society 16 (1): 392–402.
Han, Ariel, Xiaofei Zhou, Zhenyao Cai, Shenshen Han, Richard Ko, Seth Corrigan, and Kylie A. Peppler. 
2024. “Teachers, parents, and students’ perspectives on integrating generative AI into elementary 
literacy education.” In CHI ‘24: Proceedings of the 2024 CHI Conference on Human Factors in 
Computing Systems, edited by Florian Floyd Mueller, Penny Kyburz, Julie R. Williamson, Corina Sas, 
Max L. Wilson, Phoebe Toups Dugas, and Irina Shklovski, 1–17. Association for Computing 
Machinery. https://doi.org/10.1145/3613904.3642438.
Javier, Darren Rey C., and Benjamin Luke Moorhouse. 2023. “Developing secondary school English 
language learners’ productive and critical use of ChatGPT.” TESOL Journal 15 (2): e755.  
https://doi.org /10.1002/tesj.755. 
Joseph, Genimon Vadakkemulanjanal, Kennedy Andrew Thomas, and Alex Nero. 2021. “Impact of 
technology readiness and techno stress on teacher engagement in higher secondary schools.” Digital 
Education Review 40: 51–65. https://doi.org/10.1344/der.2021.40.51-65.
Kohnke, Lucas, Benjamin Luke Moorhouse, and Di Zou. 2023a. “ChatGPT for language teaching and 
learning.” RELC Journal 54 (2): 537–50. https://doi.org/10.1177/00336882231162868. 
—. 2023b. “Exploring generative artificial intelligence preparedness among university language instructors: 
A case study.” Computers and Education: Artificial Intelligence 5: 100156. https://doi.org/10.1016/j.ca 
eai.2023.100156.
Liang, Jia-Cing, Gwo-Jen Hwang, Mei-Rong Alice Chen, and Darmawansah Darmawansah. 2023. “Roles 
and research foci of artificial intelligence in language education: An integrated bibliographic analysis 
and systematic review approach.” Interactive Learning Environments 31 (7): 4270–96.  
https://doi.org /10.1080/10494820.2021.1958348.
Luckin, Rose. 2017. “Towards artificial intelligence-based assessment systems.” Nature Human Behaviour 1 
(3): 0028. https://doi.org/10.1038/s41562-016-0028. 
148 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ...
Luckin, Rosemary, Mutlu Cukurova, Carmel Kent, and Benedict Du Boulay. 2022. “Empowering 
educators to be AI-ready.” Computers and Education: Artificial Intelligence 3: 100076.  
https://doi.org /10.1016/j.caeai.2022.10007. 
Molenaar, Inge. 2022. “The concept of hybrid human-AI regulation: Exemplifying how to support young 
learners’ self-regulated learning.” Computers and Education: Artificial Intelligence 3: 100070.  
https://doi .org/10.1016/j.caeai.2022.100070. 
Nascimbeni, Fabio, and Steven Vosloo. 2019. Digital Literacy for Children: Exploring Definitions and 
Frameworks. UNICEF Office of Global Insight and Policy.
Nazaretsky, Tanya, Moriah Ariely, Mutlu Cukurova, and Giora Alexandron. 2022. “Teachers’ trust in AI‐
powered educational technology and a professional development program to improve it.” British 
Journal of Educational Technology 53 (4): 914–31. https://doi.org/10.1111/bjet.13232. 
O’Bannon, Blanche W., and Kevin Thomas. 2014. “Teacher perceptions of using mobile phones in the 
classroom: Age matters!” Computers & Education 74: 15–25. https://doi.org/10.1016/j.compedu.2014 
.01.006. 
Pokrivcakova, Silvia. 2023. “Preparing teachers for the application of AI-powered technologies in foreign 
language education.” Journal of Language and Cultural Education 7 (3): 135–53. https://doi.org/10.24 
78/jolace-2019-0025.
Polak, Sara, Gianluca Schiavo, and Massimo Zancanaro. 2022. “Teachers’ perspective on artificial 
intelligence education: An initial investigation.” In CHI EA ‘22: Extended Abstracts of the 2022 CHI 
Conference on Human Factors in Computing Systems, edited by Simone Barbosa, Cliff Lampe, Caroline 
Appert, and David A. Shamma, 1–7. Association for Computing Machinery. https://doi.org/10.1145 
/3491101.3519866.
Rebolledo Font De la Vall, Roxana, and Fabián González Araya. 2023. “Exploring the benefits and 
challenges of AI-language learning tools.” International Journal of Social Sciences and Humanities 
Invention 10 (1): 7569–76. https://doi.org/10.18535/ijsshi/v10i01.02.
Sütçü, Selim Soner, and Elif Sütçü. 2023. “English teachers’ attitudes and opinions towards artificial 
intelligence.” International Journal of Research in Teacher Education (IJRTE) 14 (3): 184–93.  
https:// doi.org/10.29329/ijrte.2023.598.12. 
Timpe-Laughlin, Veronika, Tetyana Sydorenko, and Phoebe Daurio. 2022. “Using spoken dialogue 
technology for L2 speaking practice: What do teachers think?” Computer Assisted Language Learning 
35 (5–6): 1194–217. https://doi.org/10.1080/09588221.2020.1774904. 
Wegerif, Rupert, and Louis Major. 2023. The Theory of Educational Technology: Towards a Dialogic 
Foundation for Design. Taylor & Francis.
Yesilyurt, Yusuf Emre. 2023. “AI-enabled assessment and feedback mechanisms for language learning: 
Transforming pedagogy and learner experience.” In Transforming the Language Teaching Experience in 
the Age of AI, edited by Galip Kartal, 25–43. https://doi.org/10.4018/978-1-6684-9893-4.ch002. 
Yuan, Yijia. 2023. “An empirical study of the efficacy of AI chatbots for English as a foreign language 
learning in primary education.” Interactive Learning Environments 32 (10): 6774–89.  
https://doi.org /10.1080/10494820.2023.2282112. 
Yue, Miao, Morris Siu-Yung Jong, and Davy Tsz Kit Ng. 2024. “Understanding K–12 teachers’ 
technological pedagogical content knowledge readiness and attitudes toward artificial intelligence 
education.” Education and Information Technologies 29: 19505–36. https://doi.org/10.1007/s10639 
-024-12621-2. 
149ENGLISH LANGUAGE AND LITERATURE TEACHING
Appendix 1
Questionnaire 1 (adapted from Galindo-Domínguez et al. 2024a)   
1 I am willing to use artificial intelligence in my teaching practice.
2 I am willing to explore new opportunities for integrating AI into teaching and learning 
processes.
3 I would love to be able to use artificial intelligence in my work as a teacher.
4 I am interested in learning about artificial intelligence in education.
5 I am interested in exploring the use of artificial intelligence as a complementary tool for 
my teaching practice.
6 The growing development of artificial intelligence in education is exciting to me.
7 Artificial intelligence should be introduced as part of teacher training.
8 There are many potential benefits to applying artificial intelligence in education.
9 I will stay up to date with the latest utilities and applications of artificial intelligence.
10 I will continue learning about artificial intelligence.
11 I don’t see how artificial intelligence could be relevant to my teaching practice.
12 I am convinced that artificial intelligence will have a positive impact on education.
13 Artificial intelligence will positively revolutionise education.
14 I hope that artificial intelligence can help me engage my students more.
15 Artificial intelligence can be used to assist students.
16 Artificial intelligence can be used to support students with specific educational needs 
(special educational needs students, gifted students, etc.).
17 Artificial intelligence can promote more personalised teaching.
18 Artificial intelligence can be used to create more personalised teaching materials.
19 Artificial intelligence can help my students perform better on school assignments.
20 Artificial intelligence can facilitate assessment and provide feedback to my students.
21 I have never interacted with artificial intelligence in an educational or general context.
22 I have had positive experiences with the use of artificial intelligence in education.
23 I have extensive experience with the use of artificial intelligence in education.
24 I can share my knowledge and skills about artificial intelligence with other teachers.
25 I have had some experiences with the use of artificial intelligence in education.
150 Bojan Prosenjak, Eva Jakupčević Attitudes of Primary and Secondary EFL Teachers in Croatia Towards the Use of AI ...
Appendix 2
Questionnaire 2
1 What is your perspective on the potential of using AI in EFL lessons?
2 What problems do you see with using AI in EFL lessons?
3 What AI tools have you used or would like to use in class, if such tools exist?
4 Can you describe your experiences with using AI tools or materials in EFL lessons? Please 
describe the content and purpose of their use.
5 How does AI affect EFL curriculum/syllabus design and lesson planning? Can you name 
some examples?
6 How do you think AI influences EFL teaching strategies and student motivation? Can 
you describe some examples?
7 In your opinion, what are some ethical and social problems of using AI in (EFL) lessons? 
How should they be dealt with by the teachers and institutions?
8 How do you envisage the future of using AI in EFL lessons? How can teachers prepare 
themselves and their students?
Translation Studies
Part V

153TRANSLATION STUDIES
2025, Vol. 22 (1), 153-170(228)
journals.uni-lj.si/elope
https://doi.org/10.4312/elope.22.1.153-170
UDC: [811.111’25=163.6:33]:004.8
Nataša Gajšt
University of Maribor, Slovenia
Applications of AI-driven Tools in Translating 
and Drafting Commercial Correspondence – 
A Slovenian-English Perspective
ABSTRACT
The recent emergence and the widespread use of AI-driven tools have significantly 
affected various aspects of human communication, including business-related professional 
communication. This pilot study explores how AI-driven tools can be used in drafting 
commercial correspondence by considering its genre conventions. To this end, we carried 
out a small-scale study to assess AI-driven tools for translating and drafting commercial 
correspondence. We used ChatGPT, Claude 3.5 Sonnet and Gemini 2.0 Flash to translate 15 
letters from Slovenian into English and to draft 10 letters in English based on the prompts in 
Slovenian. Our key findings show that although the translations are similar, slight differences 
occur mainly at the level of formality and the scope of formulaic expressions. Concerning 
the drafts, the AI-driven tools produced adequate letters which might sometimes need light 
human editing. 
Keywords: Business English, commercial correspondence, translation, drafting, AI-driven 
tools, English, Slovenian
Uporaba orodij umetne inteligence pri prevajanju in sestavljanju 
poslovnih dopisov – slovensko-angleški vidik
IZVLEČEK
Nedavni razmah in obsežna uporaba orodij, ki temeljijo na umetni inteligenci (UI), imata 
velik vpliv na različne vidike človeške komunikacije, vključno s strokovno komunikacijo v 
poslovnem okolju. Ta pilotna študija ugotavlja, kako lahko orodja, ki jih poganja UI, upo-
rabimo pri pisanju poslovne korespondence z upoštevanjem njenih žanrskih značilnosti. V 
ta namen smo izvedli manjšo raziskavo, v kateri smo ocenjevali orodja, ki temeljijo na UI, 
za prevajanje in oblikovanje poslovne korespondence. Uporabili smo ChatGPT, Claude 3.5 
Sonnet in Gemini 2.0 Flash za prevod 15 pisem iz slovenščine v angleščino ter za pripravo 
10 pisem v angleščini na podlagi navodil v slovenščini. Naše ključne ugotovitve kažejo, da so 
prevodi med seboj razmeroma podobni, vendar se rahle razlike pojavljajo predvsem na ravni 
formalnosti in obsega rabe ustaljenih izrazov. Uporabljena orodja UI so pripravila ustrezna 
pisma, ki pa vendarle včasih potrebujejo manjše popravke. 
Ključne besede: poslovna angleščina, poslovna korespondenca, prevajanje, pisanje, orodja 
UI, angleščina, slovenščina
154 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ...
1 Introduction
In the past couple of years, the surge in the usage of AI-driven tools (e.g., ChatGPT) has 
greatly impacted text production. Today, AI-driven tools can significantly facilitate text 
production and, consequently, impact written communication. They can be used to prepare 
written documents for both general and professional purposes, including various types of 
business-related documents. Among the latter, they can be used as an aid in translating and 
drafting commercial correspondence. According to a recent study by Cardon et al. (2023), 
AI-driven tools have transformed the way people communicate for business purposes. Since 
several AI-driven tools can translate texts from one language to another, their use can be 
especially beneficial to businesses, given current trends of increasing internationalization of 
business operations and consequent communication in, predominantly, English (Halimi and 
Shiyab 2015). Employees working in sales and purchasing may take advantage of these tools 
when communicating with business partners and customers in a language different from 
their first language. 
A substantial proportion of business communication is carried out in writing (Halimi and 
Shiyab 2015). Thus, it is crucial that any commerce-related letter be appropriately structured 
and written in a clear and professional manner. With a well-structured commercial letter or 
e-mail, the recipient can easily understand the message and the action they need to take based 
on its content. Also, a well-written message shows the sender’s professionalism and competence 
and their respect for the receiver. This professionalism adds to credibility and trust among 
business partners and customers and contributes to strong business relationships. On the 
other hand, a poorly written message can harm a company’s reputation. In other words, there 
is a correlation between well-written business-related communication and positive business 
results (Rogerson-Revell 2007, 1).
The primary purpose of commercial correspondence is to address commerce-related matters 
(e.g., product enquiries, order confirmations, or complaints) (Ashley 2003). Therefore, it 
should be written clearly, concisely and without any ambiguities. Well-written commercial 
correspondence increases the chances of achieving the set goals: e.g., agreeing to proposed 
sales terms, finalizing or concluding the sale. In short, commercial correspondence should 
be written in the appropriate professional tone, observing genre conventions and including 
accurate specialised, sales-related terminology (Talbot 2009; Wilson and Wauson 2010; 
Sankrusme 2017). 
Following the above, the overall goal of this paper is to explore the ways in which AI-
driven tools can be used to (1) translate commercial correspondence from Slovenian 
into English, and (2) draft commercial correspondence in English based on prompts in 
Slovenian by examining specific elements related to the genre conventions of commercial 
correspondence. First, we present the theoretical framework for our study. Next, we 
describe how we carried out our research. Third, we present and discuss the results of the 
study. In the final part, we summarize our findings and propose potential areas of research 
together with implications for practice.
Applications of 
AI-driven Tools 
in Translating 
and Drafting 
Commercial 
Correspondence ...
155TRANSLATION STUDIES
2 Theoretical Framework
This theoretical framework first gives an overview of commercial correspondence as a 
specific text type and text genre. Second, the application of AI-driven tools in the context of 
commercial correspondence translation and drafting in English is examined. 
2.1 Commercial Correspondence as Text Type and Text Genre 
Commercial correspondence refers to professional written texts related to sales and purchase 
of goods and provision of services. At its core, it is a communication channel between buyers 
and sellers as the two key participants in commercial transactions. Its purpose is to address 
aspects of commercial transactions: enquiries and replies to enquiries regarding general terms 
of sale and terms of payment, quotations, placing of orders and replies to orders, complaints 
and replies to complaints, etc. (Abegg and Benford 1999b, 1999a; Armitage-Amato 2005; 
Ashley 2003; Sankrusme 2017; Bennie 2021). 
If viewed through the prism of systemic functional linguistics (Halliday and Matthiessen 
2004, 61), commercial correspondence is created for specific communication purposes 
within the business context (the ideational level). Commercial correspondence also creates 
the relationship between the seller and the buyer by laying down their rights and obligations 
(the interpersonal level). The third level, the textual level, is the actual linguistic realisation of 
the purpose of the message and the interpersonal relationships between the two parties. This 
level is subject to lexical and grammatical characteristics and to the purpose of commercial 
correspondence and is realized through the typical structure of commercial letters. 
This view on commercial correspondence shows that it needs to be considered as a text type 
and as a text genre. As a text type (see Krajnc Ivič (2020) for a definition), commercial letters 
can be classified as a professional text type because they integrate the use of specialized sales-
related terminology and form part of written business discourse that serves to fulfil specific 
tasks or functions. Via commercial correspondence, a company builds rapport with partners, 
suppliers, and customers, thus establishing and maintaining sales-related cooperation. More 
specifically, commercial correspondence is used to convey specific information (e.g., product 
or service details, prices, discounts, terms of delivery, or terms of payment) to negotiate and 
confirm sales-related agreements (e.g., stating and negotiating terms and conditions sale of a 
particular good or service), or to address any issues arising from non-performance of either 
the seller’s or customer’s obligations (e.g., dealing with customer complaints, delivery or 
payment delays, or faulty products) (Davis 2010).  
Like the interrelatedness of the ideational, interpersonal and textual levels of texts, the 
concepts of text type and text genre are also interrelated (Krajnc Ivič 2020). While 
text types are defined via functions of a specific group of texts (i.e., the ideational and 
interpersonal level), text genres are defined via the structure of this same group of texts (i.e., 
the textual level). As a specific text genre within the broader context of business-related 
communication, commercial correspondence should adhere to its established structural 
and linguistic conventions. 
156 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ...
Above all, commercial correspondence letters should follow a clear structure, which includes 
the salutation, the main body (the message of the text) and an appropriate closing (Ashley 
2003; Lougheed 2003; Taylor 2012). Although the content of these letters varies, it is 
recommended that the information be presented in a clear and logically structured way. If 
the contents are complex, one is allowed to use bullet points to increase the readability of the 
text (Wilson and Wauson 2010). 
The structure of commercial correspondence letters is rather uniform and generally consists of 
four sections: the introduction, the core of the letter, the action required based on the letter, and 
a polite and positive ending. In the introduction, the sender frames the message into the context 
known to both the sender and the receiver (e.g., reference to an advertisement, or reference to 
previous contact or correspondence). The core of the letter addresses the reasons for writing 
(e.g., an enquiry about the product, or a reminder about the payment) and guides the reader to 
the next section, which provides information about the action that is expected from the receiver 
based on the previous section (e.g., sending a reply with the requested product information, or 
addressing concerns about late payment). The body of the letter ends with a polite and positive 
conclusion in which the sender expresses gratitude for the reader’s attention to the letter, a desire 
for the continuation of cooperation, and a clear indication of the next steps. 
As regards the language of the commercial correspondence, several key observations should be 
made. As professional written communication, commercial correspondence should primarily 
be written in a professional tone. That is, the language used should be professional and 
polite without colloquial expressions. However, employing an overly formal and somewhat 
outdated style of writing is also discouraged, particularly in the context of English in an 
international context as the lingua franca of the business world (Terk 2016; Terk and Chan 
2014; Wallwork 2014; Gajšt 2014). The current trend in business writing leans towards a 
neutral, straightforward style of writing (Abegg and Benford 1999b; Taylor 2012), which 
adds to the clarity and conciseness of the message (Wilson and Wauson 2010, 454; Carey 
2002). Finally, the language in commercial correspondence letters should be polite to reflect 
respect and professionalism on the part of the sender. 
Linked to genre conventions and the professional tone and style of writing in English, two 
characteristics should be pointed out: the use of passive voice and nominalization. In general, 
passive voice is used to place focus on the action rather than on the doer of the action (e.g., 
the doer of the action is unknown or irrelevant; highlighting the doer may be sensitive in 
nature, or avoidance of personal pronouns such as you or we) (Biber et al. 2021; Leech and 
Svartvik 1990; Quirk et al. 1985; Hribar 2021, 2018; Kalin Golob 2002). In commercial 
correspondence, the use of passive voice may be appropriate in complaints or refusals or other 
types of messages where direct reference to the doer of the action may not be appropriate 
from a politeness standpoint (e.g., ‘blame’). From the perspective of using plain English in 
the context of business-related writing in an international context, the use passive voice is 
used only when absolutely needed (Bailey 1996; Taylor 2012). 
The second characteristic is the use of nominal structures. Nominalization is common in 
professional texts since it adds to the formality and conciseness of the message. Like the 
passive voice, it also depersonalizes messages (‘They delayed the shipment.’ vs ‘There was a 
157TRANSLATION STUDIES
delay in shipment.’), compacts them and adds to the formality of the text (‘The shipment of 
the purchased goods will begin next month.’ vs ‘We will begin shipping the purchased goods next 
month.’). However, nominal structures may result in the text being more difficult to read; in 
that case, verbal structures are preferred.
Summing up, good business writing in English in the international context should be polite, 
accurate, brief and clear (i.e., written in plain English and in an easy and natural style).
2.2 AI-Driven Tools for Text Drafting and Translation
Today, AI-driven tools which can be used to either translate a text into English or write it 
in English based on a prompt in another language are widely available. They can perform a 
wide variety of tasks from grammar checks to creating written texts without much human 
intervention (Marzuki et al. 2023, 2). Several studies have been performed regarding the 
usefulness of these tools for text production and text translation. Most of these address such 
tools in a pedagogical context as an aid in writing or translation skills in a foreign language. 
Several studies have shown that students favour the use of AI as an aid in their learning, which 
was also supported by the results of writing tests and improved language proficiency (O’Neill 
2016; Emara 2024; Kruk and Kałużna 2025). On the other hand, some studies have shown 
that the overuse of AI translation systems, despite saving time and increasing efficiency, can 
lead to the impairment of independent writing development and hinder critical thinking and 
deeper learning (Jaruwatsawat et al. 2024). That is, overreliance on AI-driven tools may lead 
users to become passive users of these tools. 
AI-driven tools for text production and text translation have both strengths and weaknesses. 
Regarding their strengths, they are fast, easily accessible and cost-effective (Saitkhanova 2024; 
Moneus and Sahari 2024). They are designed to continuously evolve and improve their output 
with every user interaction (e.g., linguistic patterns and idiomatic expressions). In addition, 
they can translate between multiple languages, which caters for diverse translation needs 
(Saitkhanova 2024; Suhardiman et al. 2024). In contrast, the main reported weaknesses or 
limitations of AI-driven tools for translation lie in contextual understanding, cultural sensitivity 
and capacity to deal with complex documents. They have limited ability to understand 
nuances in language or idiomatic expressions and metaphors and are not always able to fully 
comprehend cultural references, which may result in inappropriate translations. When it 
comes to complex documents or highly specialized texts (e.g., medical, technical, or legal), AI-
driven tools do not show a high degree of accuracy, such as when translating highly specialized 
terminology (Moneus and Sahari 2024; Suhardiman et al. 2024; Alisherovich 2024). 
The challenges related to the uncritical use of AI-powered tools in translation thus demand 
a far more careful as well as critical evaluation of AI-translated texts in the post-translation 
stage to detect and eliminate inaccuracies in translation (Ning and Ban 2024). Orel Kos 
(2024) reports a similar finding in a study concerning translation for the screen, where the 
subtitles are often of lower quality if done with the help of machine translation than when 
done exclusively by humans. Although AI offers a quick, cost-effective translation, human 
post-editing of AI-translated texts has proven positive when specialized texts such as legal 
documents, marketing materials, technical manuals, and business reports are concerned, 
158 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ...
as it improved the linguistic accuracy, clarity and cultural appropriateness of the texts 
(Suhardiman et al. 2024). 
2.2.1 AI-Driven Tools for Translation and Drafting of Commercial 
Correspondence – Case Selection
For our study, we decided to use the following AI-driven tools: ChatGPT 4 (OpenAI n.d.), 
Claude 3.5 Sonnet (Claude.ai n.d.), and Gemini 2.0 Flash (Gemini 2024) since these offer 
mostly free access and can thus be used by anyone. Moreover, they can be used both for 
translating texts from Slovenian into English and for drafting letters in English based on 
prompts in Slovenian. To get a better understanding of these three AI-driven tools, we 
prompted them to give us their own description of their scope, their strengths and their 
weaknesses regarding their ability to translate and draft commercial correspondence letters. 
Our prompt was: “How would you describe yourself in terms of your scope, strengths and 
weaknesses when it comes to writing and translating commercial correspondence?” The responses 
are given in Tables 1 to 3 below. 
Table 1. Self-described scope of the selected AI-driven tools concerning commercial 
correspondence translation and production. 
ChatGPT •	writes and translates commercial correspondence, including 
business inquiries, quotations, orders, contracts, invoices, and 
complaint letters,
•	ensures clarity, professionalism, and adherence to international 
business standards, 
•	human editing advisable to refine business communication.
Claude 3.5 Sonnet •	can assist with virtually all forms of written business communication, 
including formal business letters, emails, memoranda, reports, 
proposals, meeting minutes, and basic contractual documents,
•	can work across multiple languages for translation purposes, 
•	can adapt content for different regions and cultural contexts,
•	can handle both internal and external communications, from 
entry-level to executive correspondence.
Gemini 2.0 Flash •	well-trained on formal language and professional communication 
styles,
•	can adapt to different levels of formality as needed,
•	training data includes a substantial amount of business-related 
vocabulary – to generate text with appropriate terminology,
•	can produce various types of commercial correspondence, including 
letters, emails, memos, proposals, reports, and more,
•	can adhere to standard formatting conventions,
•	can translate commercial correspondence between multiple 
languages, facilitating international business communication.
(OpenAI n.d..; Claude.ai n.d..; Gemini 2024) 
159TRANSLATION STUDIES
Table 1 shows that these AI-driven tools claim to be very good at creating formal and 
professional business communication. They can use extensive business vocabulary and are 
able to create different correspondence formats even in a multilingual setting, which makes 
Table 2. Self-described strengths of the selected AI-driven tools concerning commercial 
correspondence translation and production. 
ChatGPT •	leverages AI tools to speed up translation while maintaining 
human oversight for final accuracy,
•	ensures that commercial correspondence is precise, avoiding 
ambiguity that could lead to misinterpretation or disputes, 
•	follows standard structures for different types of business 
communication, e.g., formal salutations, closing remarks, and 
polite requests, 
•	uses appropriate business and trade-related terminology, 
ensuring that terms related to payment, delivery, and 
warranties are correctly translated and applied,
•	can adjust tone and phrasing to match English and Slovenian 
business etiquette, ensuring politeness and professionalism.
Claude 3.5 Sonnet •	is effective at restructuring complex information into clear, 
readable formats,
•	can also provide detailed explanations for suggested 
improvements, helping users understand the reasoning behind 
specific writing choices
•	can write clear, professional business communications in 
multiple languages,
•	can maintain consistent tone and formality levels,
•	can follow standard business letter formats and conventions,
•	can adapt style for different business contexts and cultures,
•	can translate while preserving meaning and intent,
•	ensures proper terminology in specialized fields (e.g, financial 
or legal).
Gemini 2.0 Flash •	can quickly draft and translate correspondence, saving time 
and resources,
•	can generally produce grammatically correct and error-free 
text, essential for maintaining professionalism in business 
communication,
•	can maintain a consistent tone and style throughout a series 
of communications, which is important for branding and 
building relationships,
•	can generate neutral and unbiased language, avoiding 
emotional or subjective language that might be inappropriate 
in a business context,
•	can easily adapt to different formats and templates for various 
types of correspondence.
(OpenAI n.d..; Claude.ai n.d..; Gemini 2024) 
160 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ...
them useful in a business context. Specifically, the responses provided by Gemini 2.0 Flash 
focused on formal and professional tone, business vocabulary, common correspondence 
formats and the ability to translate between several languages. ChatGPT’s focus was 
on specialized writing and translating various forms of commercial correspondence; 
additionally, it claimed a focus on clarity and professional tone in its output. However, 
it also highlighted the need to combine the AI-driven tool with human editing. Lastly, 
Claude 3.5 Sonnet claimed the ability to write and translate a wide range of commercial 
correspondence at different company hierarchy levels across various regional and cultural 
contexts. 
Table 3. Self-described weaknesses of AI-driven tools concerning commercial correspondence 
translation and production. 
ChatGPT •	can handle standard business agreements, but highly complex 
legal clauses may require additional review by a legal expert
•	points out that AI-generated commercial correspondence often 
requires human refinement – to ensure appropriate tone and 
phrasing, particularly in sensitive negotiations
•	points out that some expressions do not translate directly between 
Slovenian and English –careful human adaptation is required to 
maintain professionalism
Claude 3.5 Sonnet •	cannot maintain ongoing correspondence beyond single sessions, 
•	points out the need for human review of any legally binding 
documents,
•	cannot independently verify company-specific protocols or 
requirements, 
•	points out that translations, while accurate for general business 
purposes, should be professionally reviewed for critical documents,
•	cannot access real-time business information or company-specific 
information,
•	cannot sign documents or provide legally binding translations.
Gemini 2.0 Flash •	can sometimes miss subtle cultural nuances in business 
communication, which can vary significantly across countries 
and industries,
•	might not be familiar with highly specialized jargon used in 
specific industries (this can affect both writing and translation 
accuracy),
•	does not have a deep understanding of complex business situations 
or strategies (limits its ability to craft truly effective and persuasive 
correspondence in nuanced situations),
•	cannot fully grasp the context of a long-term business relationship 
or the history of previous interactions,
•	industry-specific idioms or colloquialisms might not always be 
accurate or appropriate.
(OpenAI n.d..; Claude.ai n.d..; Gemini 2024)
161TRANSLATION STUDIES
According to the information provided by the three AI-driven tools, they are efficient, 
adaptable and fast in translating and drafting commercial correspondence. For example, both 
Gemini 2.0 Flash and ChatGPT highlighted their speed and efficiency as well as their overall 
accuracy in translating and drafting commercial correspondence. All three AI-driven tools 
claimed to be able to adhere to genre conventions (i.e., observing standard structures and 
formats of different types of texts), which includes the appropriate levels of formality and 
tone. Gemini 2.0 Flash specifically highlighted its grammatical accuracy.
The outputs by the three AI-driven tools show similarities regarding their weaknesses in 
translating commercial correspondence, i.e. the inability to spot the nuances of business 
culture, or their lack of knowledge of highly specific jargon, business-related colloquialisms 
and idiomatic expressions. Also, they may struggle with maintaining contextual awareness 
over a long stretch of time. Some AI-driven tools also admitted their lack of actual experience 
with the business world and emotional intelligence. ChatGPT specifically pointed out its 
shortcomings and the need for human editing when it comes to the legal complexity of texts.
Based on this framework, we formulated the following research questions: 
Research question 1: How effectively do selected AI-driven tools translate commercial 
correspondence from Slovenian into English in terms of commercial correspondence as a 
text genre? 
Research question 2: How effectively do selected AI-driven tools generate commercial 
correspondence in English based on prompts in Slovenian in terms of commercial 
correspondence as a text genre? 
3 Method
To answer our research questions, we designed a small-scale pilot study. We selected three 
freely available AI-driven tools: ChatGPT 4, Claude 3.5 Sonnet and Gemini 2.0 Flash. We 
performed our analysis for the two research questions separately. Being open-ended, our study 
approach enabled us to test the accuracy of adherence to the conventions of commercial 
correspondence as a text type and text genre. 
Concerning the first research question, we selected 15 commercial correspondence letters 
in Slovenian (enquires, replies to enquiries, offers, quotations, and complaints). These were 
model letters we currently use to teach commercial correspondence in our Business English 
classes and were based on typical letters found in English-language commercial correspondence 
textbooks or guidebooks. We entered these letters into the AI-driven tools ChatGPT 4, Claude 
3.5 Sonnet and Gemini 2.0 Flash and prompted them to translate the texts. We used the same 
prompt with all three tools: “Translate the following letters into English.” We deliberately kept 
the prompt as simple as possible. After we obtained the outputs, we analysed them based 
on the predetermined criteria. Regarding commercial correspondence genre conventions, we 
limited our study to politeness, nominalization, the use of passive voice, and the ‘ease-of-read’ 
(as related to the use of English as a lingua franca in business) in line with the observations on 
commercial correspondence and the strengths and weaknesses of AI-driven tools.
162 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ...
For the second research question, we selected 10 prompts (i.e., instructions) for drafting 
commercial correspondence in Slovenian (enquiries, replies to enquires, offers, and 
complaints). As in the case of the letters used for the first research question, these instructions 
are model samples we use in our classes to teach commercial correspondence writing in 
English. We entered them into ChatGPT 4, Claude 3.5 Sonnet and Gemini 2.0 Flash to 
get the letters in English. We used the same prompt with all three tools: “Draft the letter 
in English based on the prompt in Slovenian.” After we obtained the outputs (the drafted 
letters), we analysed them based on the guidelines for commercial correspondence in English 
commercial correspondence textbooks and handbooks and on the strengths and weaknesses 
of AI-driven tools to identify linguistic and contextual differences. 
As the final step in our analysis, we performed the Flesch-Kincaid and the Gunning Fog tests 
to see which of the AI-driven tools produced the texts (translations and drafted letters) that 
were the easiest to read and would be closest to the recommended clear, simple style for writing 
commercial correspondence (especially in the international context). The latter was designed 
to reduce unnecessary complexity in business writing (“Readability Checker - Reading Level 
Calculator” 2024; Miller 2024). The 0–100 scale for the Flesch-Kincaid test is as follows: 0–50 
Very difficult (‘CEFRL C2 level’), 50–60 Fairly difficult (‘CEFRL C1 level’’), 60–70 Plain 
English (‘CEFRL B2 level’), 70–80 Fairly easy (‘CEFRL B1 level’), 80–90 (CEFRL A2 level’), 
90–100 (‘CEFRL A1 level’) (Linguapress.com n.d.). The 0–20 scale for the Gunning Fog test 
is as follows: 1–5 (‘very easy to read’); 5–8 (‘a text considered ideal for average readers’), 8–11 
(‘fairly difficult to read’), 11–20 (‘hard to read for most readers’). This scale was designed with 
the United States education system in mind and for its corresponding levels of education, i.e., 
primary school to graduate levels (Clickhelp.com n.d.). The average results of these tests are 
given separately for the translations and for the drafted letters.
4 Results and Discussion
In this section of the paper, we present and discuss our findings. 
4.1 Using AI-Driven Tools to Translate Commercial Correspondence from 
Slovenian into English
The first research question addressed the ability of ChatGPT, Claude 3.5 Sonnet and Gemini 
2.0 Flash to translate commercial correspondence into English. Since we provided the three 
tools with whole letters, we did not focus on the structure of the letters per se but on the 
differences and similarities in line with the selected criteria. At the outset, we should state 
that ChatGPT’s outputs included the advice that human editing might be relevant. Although 
we did not specifically state in our prompts that the sentences were part of commercial 
correspondence, Claude 3.5 Sonnet’s response to our prompt began with the information 
that it “will translate the letters while maintaining the formal business style and structure of 
the original” (Claude.ai n.d.) (N.B.: the tone for these translations was set to ‘formal’). This 
showed us that the tool recognised the letters as business-related. 
We present and discuss our findings in three areas: formality levels, including the notion 
of politeness, nominalisation, and the use of passive voice. The examples illustrating our 
163TRANSLATION STUDIES
findings are taken from different sections of the letters: the opening, the main body, the 
concluding part and the complimentary close. We give Slovenian sentences and their English 
equivalents as translated by the three selected AI-driven tools.
Concerning the notion of politeness, we found that, overall, the translations were similar. 
However, they displayed slight differences in the adverbs used to soften or highlight the 
politeness of a statement or action. The following examples in Tables 4-7 taken from different 
types of letters illustrate this point.
Table 4. Translating polite expressions – expressing interest. 
V najnovejši številki revije Sports&Health smo videli vaš oglas za športna oblačila in  
se zanimamo za vaše izdelke, še posebej za oblačila za zimske športe. 
ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash
We saw your advertisement 
for sportswear in the latest 
issue of Sports & Health 
magazine and are very 
interested in your products, 
particularly clothing for 
winter sports.
We saw your advertisement 
for sportswear in the latest 
issue of Sports&Health 
magazine and are 
interested in your products, 
particularly in clothing for 
winter sports. 
In the latest issue of 
Sports&Health magazine, 
we saw your advertisement 
for sportswear and are 
interested in your products, 
especially clothing for 
winter sports.
Table 5. Translating polite expressions – expressing gratitude.
Iskrena hvala za vaš dopis z dne 20. junija 20__, v katerem povprašujete po asortimanu naši 
izdelkov, ki smo jih predstavljali na sejmu Apimell v Italiji prejšnji teden. 
ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash
Thank you very much for your 
letter dated June 20, 20__, in 
which you inquire about the 
range of our products that we 
showcased at the ‘Apimell’ fair 
in Italy last  week.
Thank you for your letter 
dated June 20th, 20__, 
inquiring about our product 
range presented at the 
‘Apimell’ trade fair in Italy last 
week
Thank you very much for 
your letter of June 20, 2024, 
in which you inquire about 
the range of our products that 
we presented at the ‘Apimell’ 
fair in Italy last week
Table 6. Translating polite expressions – request for action.
Ker bi želeli naročene izdelke ponuditi našim kupcem že v začetku septembra, vas prosimo, da 
potrdite naročilo v teku 8 delovnih dni.
ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash
As we would like to offer these 
products to our customers at 
the beginning of September, 
we kindly ask you to confirm 
the order within 8 working 
days.
As we would like to offer these 
products to our customers at 
the beginning of September, 
please confirm the order 
within 8 working days.
As we would like to offer 
the ordered products to our 
customers at the start of 
September, please confirm the 
order within 8 working days.
164 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ...
Table 7. Translating polite expressions – request for action.
Prav tako vas prosimo, da pri izstavitvi fakture upoštevate 3% popust, ki ga zagotavljate za 
naročila, dana do 15. junija 20__.
ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash
We also kindly ask that you 
apply the 3% discount you 
offer for orders placed before 
June 15, 20__, when issuing 
the invoice.
We also request that you 
apply the 3% discount 
guaranteed for orders placed 
before June 15th, 20__, 
when issuing the invoice. 
We also kindly ask you to take 
into account the 3% discount 
you provide for orders placed 
by June 15th, 20__, when 
issuing the invoice.
The example in Table 8 illustrates how the selected AI-driven tools translated two consecutive 
sentences containing a request. A variety of formulaic expressions were used by the three 
tools, and they did not repeat the same expression in the second sentence.
Table 8. Translating polite expressions – request for action.
Prosimo za vaš prodajni katalog ter cenik z navedenimi cenami CIP Maribor. Prav tako 
prosimo, da nam posredujete vaše prodajne pogoje, vključujoč količinske popuste ter dobavne roke. 
ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash
Please provide us with your 
sales catalog and price list, 
including CIP Maribor prices. 
We would also appreciate 
information on your sales 
conditions, including bulk 
discounts and delivery times. 
Please send us your sales 
catalog and price list with 
prices quoted CIP Maribor. 
Additionally, please provide 
your sales terms, including 
volume discounts and 
delivery times. 
Please send us your sales 
catalog and price list with 
CIP Maribor prices listed. 
We also ask that you provide 
us with your sales terms, 
including volume discounts 
and delivery times. 
Apart from the politeness, we can also observe differences in the levels of formality, ranging 
from more formal, “apply” vs “take into account” (Table 7) and “provide us with” to “send 
us” (Table 8).
Focusing on instances of nominalisation, we admit that the small sample in our study did 
not include many such cases. The first example in Table 9 shows that two AI-driven tools 
translated Slovenian verbal structure as a nominal structure, and only one tool kept the verbal 
structure (i.e. Gemini 2.0 Flash). The second example, on the other hand, shows only one 
case of nominalisation among the translated sentences. 
Table 9. Examples of nominalisation in translations.
Veseli nas, da ste se tako hitro odzvali na naše naročilo.
ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash
We appreciate your prompt 
response to our order.
We appreciate your prompt 
response to our order.
We are pleased that you re-
sponded so quickly to our order.
Upam, da bomo tudi v prihodnje lahko sodelovali enako dobro kot doslej.
ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash
We hope that our future 
collaboration will be just as 
successful as in the past. 
I hope we can continue to 
collaborate as successfully as 
we have done so far. 
I hope that we will be able to 
cooperate as well in the future 
as we have done so far.
165TRANSLATION STUDIES
Another instance of nominalisation can be observed in the next example. This time, 
nominalisation was present in the Slovenian sentence, and it was retained in the translations 
of two AI-driven tools. 
Table 10. Example of nominalisation in English translations from Slovenian nominal structure 
(buyer’s complaint – body of the letter). 
Po natančnem pregledu prispelega blaga smo ugotovili, da ste poslali drugačno število izdelkov, 
kot je bilo dogovorjeno. 
ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash
However, after carefully 
inspecting the delivered 
goods, we have noticed 
discrepancies in the quantity 
of items sent compared to 
what was agreed upon. 
After a careful inspection 
of the received goods, we 
have discovered that you 
sent different quantities than 
agreed upon. 
After a careful inspection of 
the received goods, we found 
that you sent a different 
number of products than 
agreed.
Although English sentences containing verbal structures and not nominalisations may be 
easier to read than those with nominalisation, they read as less formal: “prompt response” vs 
“you responded so quickly”; “our future collaboration” vs “we can continue to collaborate” 
and “we will be able to cooperate”. Concerning the example in Table 10, we argue that both 
translations of Slovenian nominal structure “po natančnem pregledu” are written in a formal 
tone, i.e. “after carefully inspecting” and “after a careful inspection.” (However, it needs to 
be pointed out that ChatGPT’s output should be in a different tense to be grammatically 
correct, i.e., “…after carefully inspecting the delivered goods, we noticed discrepancies…”).
Passive voice is the third typical feature of professional texts. First, our pilot study showed that 
the passive voice constructions in Slovenian were, as a rule, translated into English as passive 
voice. On the other hand, we found instances of translation from active voice in Slovenian to 
passive voice in English, as illustrated with the examples in Table 11.
Table 11. Examples of active and passive voice in Slovenian-to-English translations. 
Naročeno blago lahko dobavimo najkasneje v 30 dneh od prejema vašega naročila. 
ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash
Ordered goods can be 
delivered no later than 30 
days from receipt of your 
order. 
We can deliver ordered goods 
within 30 days of receiving 
your order. 
We can deliver the ordered 
goods no later than 30 days 
from receiving your order. 
Naše izdelke lahko pošljemo v lični darilni embalaži (cena posameznega pakiranja je dodatnih 
EUR 3,50 za posamezni izdelek).
ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash
Our products can be 
packaged in elegant gift 
wrapping (at an additional 
cost of EUR 3.50 per item).
We can ship our products 
in elegant gift packaging 
(additional cost of EUR 
3.50 per item for individual 
packaging).
Our products can be sent 
in attractive gift packaging 
(the price of each individual 
packaging is an additional 
EUR 3.50 per item).
166 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ...
The final step in our analysis of AI-generated translations of commercial correspondence was 
to perform the ease-of-read tests: the Flesch-Kincaid test and the Gunning Fog Index. The 
results are given in Table 12 per each AI-powered tool. 
Table 12. Ease-of-read results per AI-driven tool (translated letters).
Ease-of-read test
AI-driven tool
ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash
Flesch-Kincaid 49.51 52.42 56.80
Gunning Fog Index 15.65 14.83 14.44
The data in Table 12 show that ChatGPT-produced texts are overall the most difficult to 
read among the three translations, and Gemini 2.0 Flash-generated ones are the easiest to 
read. This was also evident from the translated sentences, as shown by the following examples 
(Table 13). 
Table 13. Examples of sentences – ease-of-read. 
Ker bi radi ohranili dobro sodelovanje z vašim podjetjem, vas prosimo, da sprejmete naše iskreno 
opravičilo. 
ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash
As we value our cooperation 
with your company, we 
sincerely hope you will accept 
our apology. 
As we wish to maintain good 
business relations with your 
company, we ask that you 
accept our sincere apology. 
As we would like to maintain 
good cooperation with your 
company, please accept our 
sincere apology. 
Prosimo, da nam posredujete vaš aktualni izvozni cenik in pogoje dobave ter plačilne pogoje.
ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash
We would appreciate it if 
you could provide us with 
your latest export price list, 
along with your delivery and 
payment terms.
Please provide us with your 
current export price list, 
delivery terms, and payment 
conditions.
Please send us your current 
export price list and delivery 
terms, as well as payment 
terms.
These examples show differences in the levels of formality, with ChatGPT-created translations 
being the most formal, i.e. “we sincerely hope you will accept our apology” and “we would 
appreciate it if you could provide us with” and the Gemini 2.0 Flash-created translations 
being the most colloquial, i.e. “please accept our sincere apology”, and “please send us”, 
although they still exhibit politeness and some level of formality. 
4.2 Using AI-Driven Tools to Draft Commercial Correspondence in 
English Based on Prompts in Slovenian
The second research question addressed the ability of ChatGPT, Claude 3.5 Sonnet and 
Gemini 2.0 Flash to draft commercial correspondence in English based on instructions in 
Slovenian. The tools were not given highly structured instructions as with the letters for 
research question 1. Instead, they were given comprehensive guidelines including the main 
pieces of information to be included in the letters (this information did not precisely follow 
167TRANSLATION STUDIES
the standard steps as prescribed by the advice on constructing commercial correspondence 
letters). 
Structure-wise, we found no major differences between the outputs by the three AI tools. As 
a rule, they all followed the typical ‘opening – body – conclusion’ format. Also, all three AI 
tools put the content of the letters in separate paragraphs, which further contributed to the 
overall visual presentation. The only major difference regarding structure was the use of bullet 
points to make the letters easier to read. 
Regarding the levels of formality, the use of passive voice and nominalisation, we concluded 
that the letters drafted by the AI-driven tools practically did not differ from those translated 
by the same tools. That is, the levels of formality that were evident in translation per each AI-
driven tool were also reflected in the drafted letters. This leads us to conclude that, within the 
scope of this study, these three AI-driven tools are very consistent in their output. Given the 
limitations of this paper, we do not include specific translations in this section. 
As with research question one, we also performed the ease-of-read tests on the AI-generated 
letters, the Flesch-Kincaid test and the Gunning Fog Index (see Table 14 for the results). 
Table 14. Ease-of-read results per AI-driven tool (drafted letters).
Ease-of-read test
AI-driven tool
ChatGPT Claude 3.5 Sonnet Gemini 2.0 Flash
Flesch-Kincaid 38.33 30.10 45.59
Gunning Fog Index 16.62 19.39 14.63
The data in Table 14 above show that all three tools produced texts that are difficult to read 
based on the two ease-of-read tests, the most difficult texts being produced by Claude 3.5 
Sonnet, followed by ChatGPT and Gemini 2.0 Flash. Compared to the results in Table 12, 
where the texts were translations, it shows that Claude 3.5 Sonnet produced the most complex 
text. Based on these scores, it might be assumed, within the scope of this pilot study, that 
Gemini 2.0 Flash and ChatGPT are more suitable for drafting commercial correspondence 
in line with the plain English guidelines and the trends regarding Business English as a lingua 
franca. This may also lead to the assumption that Gemini 2.0 Flash is the most suitable for 
the translation of commercial correspondence because it generates clear and easy-to-read texts 
in a rather neutral professional tone, avoiding excessive formality. That is, it seems to produce 
texts that prioritize readability, without compromising on accuracy or professionalism. All 
this, however, cannot be generalized beyond the scope of our pilot study.
Based on the ease-of-read scores for translations and drafted letters alike, we conclude that 
ChatGPT’s outputs are the most formal and may be more suited to some legal contexts. But 
for everyday commercial correspondence between buyers and sellers, especially since in the 
international business context most are not native speakers of English, the less formal outputs 
given by Gemini 2.0 Flash in particular would be the right balance between the formality 
of commercial correspondence and the need for clear and easily readable commercial 
correspondence letters in English. As for Claude 3.5 Sonnet, its main strength lies in the fact 
that it offers the option of selecting the style of its outputs, i.e., normal, concise, explanatory 
168 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ...
and formal, thus enabling the user to adapt the message’s level of formality depending on its 
receiver and its purpose. This, of course, can also be achieved with the other two AI-driven 
tools provided that the prompts include instructions on the level of formality. Linking our 
findings with the self-description by these AI-driven tools regarding their capabilities, our 
pilot study indicated that all three can translate and draft various forms of sentences and 
commercial correspondence letters from Slovenian into English, while maintaining a clear 
and professional tone and following standard formatting conventions. 
5 Conclusion
This aim of our small-scale pilot study was to test how selected AI-powered tools can be used 
for translating and drafting commercial correspondence letters. To this end, we chose three 
freely available tools, ChatGPT, Claude 3.5 Sonnet and Gemini 2.0 Flash, and analysed the 
similarities and differences in their outputs. 
Our findings have shown that all three AI tools performed their tasks in accordance with general 
guidelines and principles of writing commercial correspondence in English in international 
business contexts. They accurately translated or drafted the messages in the given letters or 
instructions in Slovenian since the tone in the outputs was largely appropriate and ranged from 
a more formal to a more neutral level of formality. As these tools are based on LLMs (large 
language models), their outputs are also grammatically accurate. In short, they are consistent 
in tone and style, and they follow the overall norms of commercial correspondence as a specific 
text genre. These AI-driven tools essentially have similar core capabilities when it comes to 
commercial correspondence in terms of professional communication styles in line with genre 
conventions as presented in English commercial correspondence textbooks and guidelines. 
Among the limitations of our pilot study is its scope, since it was based on a limited number of 
texts. Furthermore, we focused on a few selected elements for analysis, we chose not to analyse 
the terminological accuracy of translated specialized terms, and the tools we used may not 
consider the reader’s professional knowledge and background or familiarity with the topic of 
the message (readability test issue). Regarding the linguistic capabilities of AI-driven tools for 
commercial correspondence translation and drafting, this pilot study did not test them from 
the perspective of other cultural contexts. Also, we did not focus on specialized terminology, 
as this would require a different study design and focus. In addition, we included only basic 
prompts, which might need to be upgraded. Despite these limitations, our qualitative pilot 
study brings valuable insight into the potential of AI application in professional written 
communication. Our findings will be of interest to both linguists and professional users 
alike, as they provide a glimpse into the capabilities of AI-driven tools for translating or 
drafting professional texts. The findings could also have implications for teaching language 
and language for specific purposes to translation trainees (cf. Koletnik, Kirbiš, and Zupan 
2023) and English language students (cf. Tica and Krsmanović 2024). Although a small-
scale study, it adds to the knowledge of how AI, as a fast-evolving phenomenon, can facilitate 
written business communication, yet we need to bear in mind that despite its benefits, the 
outputs still need human oversight and potential revision – as was stated by the AI tools 
themselves when prompted to describe their abilities. 
169TRANSLATION STUDIES
A natural progression beyond this study could stem from its very limitations. Since AI-driven 
tools are evolving rapidly, new and more extensive studies are encouraged and should be 
performed by including a larger body of texts in the analysis, comparing the outputs after 
refining the prompts (e.g., by using a more neutral tone, or adapting the output to British 
English or American English standards), testing the AI-driven tools’ translation capabilities 
regarding other professional text types and text genres, or focusing on the accuracy of 
terminology translation, the correct use of modal verbs, or even the grammatical accuracy 
of AI-driven tools’ outputs. Also, any specific aspect of genre conventions (the use of passive 
voice or other structures) could be analysed in greater detail. 
References 
Abegg, Birgit, and Michael Benford. 1999a. Communication for Business, Satzbausteine. Hueber Verlag.
—. 1999b. Communication for Business: Zeitgemäße englische Handelskorrespondenz und 
Bürokommunikation. Lehrbuch. Hueber Verlag.
Alisherovich, Raimov Lazizjon. 2024. “The peculiarities of artificial intelligence and human translation.” 
Multidisciplinary Journal of Science and Technology 4 (6): 692–96.
Armitage-Amato, Rachel. 2005. Poslovni stiki, Angleščina: [dokumenti, pisma, e-sporočila, pogovori ...: 
jezikovni priročnik]. 1. izd. ed. PONS. Rokus.
Ashley, A. 2003. Oxford Handbook of Commercial Correspondence. Oxford University Press.
Bailey, Edward P. 1996. Plain English at Work: A Guide to Writing and Speaking. Oxford University Press.
Bennie, Michael. 2021. Guide to Good Business Communications: How to Write and Speak English Well in 
Every Business Situation. How To Books.
Biber, Douglas, Stig Johansson, Geoffrey N. Leech, Susan Conrad, and Edward Finegan. 2021. Grammar 
of Spoken and Written English. John Benjamins.
Cardon, Peter, Carolin Fleischmann, Jolanta Aritz, Minna Logemann, and Jeanette Heidewald. 2023. “The 
challenges and opportunities of AI-assisted writing: Developing AI literacy for the AI age.” Business and 
Professional Communication Quarterly 86 (3): 257–95. https://doi.org/10.1177/2329490623117 6517.
Carey, John A., ed. 2002. Business Letters for Busy People: Time Saving, Ready-to-Use Letters for Any Occasion. 
Career Press.
Claude.ai. n.d. “Claude 3.5 Sonnet.” https://claude.ai.
Clickhelp.com. n.d. “Gunning Fog Index.” https://clickhelp.com/software-documentation-tool/user-manu 
al/gunning-fog-index.html.
Davis, Kenneth W. 2010. Business Writing and Communication: The McGraw-Hill 36-hour course. 2nd ed. 
McGraw-Hill Professional.
Emara, Eman Abd El-Hafeaz Mohamad. 2024. “Using AI tools to enhance translation skills among basic 
education English major students.” CDELT Occasional Papers in the Development of English Education 
86 (1): 339–80.
Gajšt, Nataša. 2014. “Business English as a lingua franca – A cross-cultural perspective of teaching English 
for business purposes.” ELOPE: English Language Overseas Perspectives and Enquiries 11 (2): 77–87. 
https://doi.org/10.4312/elope.11.2.77-87.
Gemini. 2024. “Gemini 2.0 Flash.” https://gemini.google.com/app?hl=en-GB.
Halimi, Sonia, and Said M. Shiyab. 2015. Writing Business Letters Across Languages: A Guide to Writing 
Clear and Concise Business Letters for Translation Purposes. Cambridge Scholars Publishing.
Halliday, M.A.K., and Christian M.I.M. Matthiessen. 2004. An Introduction to Functional Grammar. 3rd 
ed. Arnold.
Hribar, Nataša. 2018. “Tvornik in trpnik.” Pravna praksa 38 (7–8): 46.
—. 2021. “Trpnik.” Pravna praksa 40 (33): 34.
Jaruwatsawat, Manassa, Chutimon Khiaosen, Waraporn Sriram, and Suphakit Phoowong. 2024. “EFL 
learners’ perspectives on using AI translation applications.” BRU ELT JOURNAL 2 (3): 252–67. 
https://doi.org/10.14456/bej.2024.17.
170 Nataša Gajšt Applications of AI-driven Tools in Translating and Drafting Commercial Correspondence ...
Kalin Golob, Monika. 2002. “Slovenščina v pravni praksi (73. del): ‘Kdo se boji trpnika?’.” Pravna praksa 
21 (4): 35.
Koletnik, Melita, Andrej Kirbiš, and Simon Zupan. 2023. “Prevajalce poučujemo jezik drugače, mar 
ne?” Ars & Humanitas 17 (1): 109–23. https://doi.org/10.4312/ars.17.1.109-123.
Krajnc Ivič, Mira. 2020. “Obravnava besedil: Merila za razlikovanje med besedilno vrsto in besedilnim 
tipom.” Slavistična revija 68 (1): 55–71. https://srl.si/ojs/srl/article/view/2020-1-1-4.
Kruk, Mariusz, and Agnieszka Kałużna. 2025. “Investigating the role of AI tools in enhancing translation 
skills, emotional experiences, and motivation in L2 learning.” European Journal of Education 60 (1): 
1–12. https://doi.org/https://doi.org/10.1111/ejed.12859.
Leech, Geoffrey, and Jan Svartvik. 1990. A Communicative Grammar of English. Longman.
Linguapress.com. n.d. “Flesch-Kincaid readability and EFL.” https://linguapress.com/teachers/flesch-kinca 
id.htm.
Lougheed, Lin. 2003. Business Correspondence: A Guide to Everyday Writing: Intermediate. 2nd ed. 
Longman, Pearson Education.
Marzuki, Utami Widiati, Diyenti Rusdin, Darwin, and Inda Indrawati. 2023. “The impact of AI writing 
tools on the content and organization of students’ writing: EFL teachers’ perspective.” Cogent 
Education 10 (2): 2236469. https://doi.org/10.1080/2331186X.2023.2236469.
Miller, Nic. 2024. “The Flesch–Kincaid readability test.” Flowpoint.ai. Flowpoint. https://flowpoint.ai/blog 
/flesch-kincaid.  
Moneus, Ahmed Mohammed, and Yousef Sahari. 2024. “Artificial intelligence and human translation: A 
contrastive study based on legal texts.” Heliyon 10 (6): e28106. https://doi.org/10.1016/j.heliyon.20 
24.e28106.
Ning, Jing, and Haidong Ban. 2024. “Application of translation technology in AI-powered translation 
workshop.” The Educational Review, USA 8 (10): 1242–49. https://doi.org/10.26855/er.2024.10.008.
O’Neill, Errol M. 2016. “Measuring the impact of online translation on FL writing scores.” IALLT Journal 
of Language Learning Technologies 46 (2): 1–39.
OpenAI. n.d. “ChatGPT.” ChatGPT. OpenAI. https://chatgpt.com/.
Orel Kos, Silvana. 2024. “Introduction of machine translation into audiovisual translation 
teaching.” ELOPE: English Language Overseas Perspectives and Enquiries 21 (1): 185–208.  
https://doi .org/10.4312/elope.21.1.185-208.
Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik. 1985. A Comprehensive Grammar 
of the English Language. Longman.
“Readability Checker – Reading Level Calculator.” 2024. Charactercalculator.com. https://charactercalcula 
tor.com/readability-checker/.
Rogerson-Revell, Pamela. 2007. “Using English for international business: A European case study.” English 
for Specific Purposes 26 (1): 103–20. https://doi.org/10.1016/j.esp.2005.12.004.
Saitkhanova, Aziza. 2024. “Artificial intelligence in translation: Benefits and drawbacks.” International 
Journal of Scientific Trends 3 (11): 70–76.
Sankrusme, Sinee. 2017. International Business Correspondence. Anchor Academic Publishing.
Suhardiman, Sani, Anggy Giri Prawiyogi, Dedy Frianto, Bunga Putri Maulia, and Zhuldiz Anay. 2024. 
“Need a translation? AI or human.” The Conference of EFL Studies 1 (1): 12–23.
Talbot, Fiona. 2009. How to Write Effective Business English: The Essential Toolkit for Composing Powerful 
Letters, Emails and More, for Today’s Business Needs. Kogan Page Publishers.
Taylor, Shirley. 2012. Model Business Letters, Emails & Other Business Documents. Pearson Education.
Terk, Natasha. 2016. Writing at Work. The Write It Well Series on Business Communication. Write It Well.
Terk, Natasha, and Janis Fisher Chan. 2014. Effective Email: Concise, Clear Writing to Advance Your Business 
Needs. Write It Well.
Tica, Lena, and Ivana Krsmanović. 2024. “Overcoming the writer’s block? Exploring students’ motivation 
and perspectives on using ChatGPT as a writing assistance tool in ESP.” ELOPE: English Language 
Overseas Perspectives and Enquiries 21 (1): 129–49. https://doi.org/10.4312/elope.21.1.129-149. 
Wallwork, Adrian. 2014. Email and Commercial Correspondence: A Guide to Professional English. Springer.
Wilson, Kevin, and Jennifer Wauson. 2010. The AMA Handbook of Business Writing: The Ultimate Guide to 
Style, Grammar, Usage, Punctuation, Construction, and Formatting. AMACOM/American 
Management Association.
171TRANSLATION STUDIES
Simon Zupan, Zmago Pavličič,  
Melanija Larisa Fabčič
University of Maribor, Slovenia
2025, Vol. 22 (1), 171-184(228)
journals.uni-lj.si/elope
https://doi.org/10.4312/elope.22.1.171-184
UDC: [8111.111’367.622=163.6:62]:004.89
Machine Translation of Independent Nominal 
Phrases in Technical Texts
ABSTRACT
This paper deals with machine translations of independent noun phrases in technical texts, 
which are not part of any sentence structure but function on their own, typically in tables 
and illustrations. Such nominal structures are common in technical texts because they allow 
technical writers to increase lexical density and precision in expression. On the other hand, 
these phrases pose a challenge for machine translation engines, as their meaning depends on 
the context. Independent noun phrases from a service manual, which were translated from 
English into Slovene by two different machine translators (DeepL and Google Translate), 
are considered in this paper. Their comparison with the original showed some limitations of 
machine translation engines in translating noun phrases, since approximately half of them 
showed a noticeable change in meaning.
Keywords: technical texts, machine translation, nominal phrases, translation shifts, technical 
translation
Strojno prevajanje samostojnih samostalniških besednih zvez 
v tehničnih besedilih
IZVLEČEK
Prispevek obravnava strojne prevode samostojnih samostalniških besednih zvez v tehničnih 
besedilih, ki niso del stavčnih struktur, temveč se pojavljajo zunaj konteksta, najpogosteje v 
preglednicah in grafičnih prikazih. Tovrstne besedne zveze se pogosto pojavljajo v tehničnih 
besedilih, saj piscem omogočajo večjo leksikalno gostoto in konciznost pri izražanju. Po drugi 
strani predstavljajo izziv za strojne prevajalnike, saj je njihov pomen odvisen od sobesedila. V 
prispevku so obravnavane samostoječe samostalniške besedne zveze iz servisnega priročnika, 
ki so bile iz angleščine v slovenščino prevedene z dvema različnima strojnima prevajalnikoma 
(DeepL in Google Translate). Njihova primerjava z izvirnikom je pokazala nekatere omejitve 
strojnih prevajalnikov pri prevajanju samostalniških besednih zvez, saj se je pri približno 
polovici besednih zvez opazno spremenil njihov pomen.
Ključne besede: tehnična besedila, strojno prevajanje, samostalniške besedne zveze, prevodni 
premiki, prevajanje tehničnih besedil
172 S. Zupan, Z. Pavličič, M.L. Fabčič Machine Translation of Independent Nominal Phrases in Technical Texts
1 Introduction
Technical translation is a specialized branch of translation studies that focuses on conveying 
technical content across languages. Despite its critical role in global communication, it has 
historically received less academic attention than other translation domains, even though it 
accounts for a sizeable portion of worldwide translation output (Kingscott 2002). As is the 
case with other types of text, many technical texts today are machine-translated. One of the 
questions this raises is how translation engines deal with the specific characteristics of technical 
texts such as the use of specialized terminology, lexical density, conciseness, or frequent use 
of passive voice. The purpose of the present study is to examine how machine translation 
engines deal with independent nominal phrases, which are common in technical texts, where 
data is presented in tables or images. The article has two parts: in the first, theoretical part, the 
major features of technical translation and machine translation are presented. In the second, 
empirical part, independent nominal phrases from a service manual in English are compared 
to their translations generated by two machine translation engines and analysed. The article 
ends by drawing conclusions from the analysis.
2 Technical Translation
Technical translation is a field of translation studies that focuses on texts with technical 
content. Although it is often referred to together with scientific translation (e.g., Olohan 
2016), significant differences do exist between the two areas. The main characteristic of 
scientific texts is that they “discuss, analyze and synthesize information with a view to explaining 
ideas, proposing new theories or evaluating methods,” while technical texts are “designed to 
convey information as clearly as possible” (Byrne 2014, 2). Technical texts thus represent 
an applicative extension of scientific texts. From a research standpoint, it is notable that in 
comparison with some other fields of translation studies, this field has received little scholarly 
attention, given that technical translation is estimated to represent as much as 90% of global 
translation output (Kingscott 2002, 247). On the other hand, according to the BITRA 
bibliography of translation research, only 9.3% of publications address technical translation 
(Aixelá 2004).
In practical terms, technical texts refer to a variety of documents with technical content. These 
range from user manuals and expert technical reports written in narrative linear prose, on the 
one hand, to data sheets with tables, lists of nominal phrases and little context, on the other 
(cf. Byrne 2014, 58-73). In turn, technical writing features different textual and linguistic 
characteristics, depending on its purpose and target readers. One common observation is that 
the language of technical writing is expected to be clear, simple, and concise (Herman 1993, 
11; Byrne 2014, 48). In contrast to literary texts, for example, technical texts thus typically 
do not abound in elements such as figures of speech, rhyme, or convoluted sentences; 
instead, technical writing is expected to be clear, objective, and unambiguous. Another 
characteristic that is directly or indirectly discussed in every treatise on technical translation 
(e.g., Galinski and Budin 1993; Byrne 2006; 2014; Olohan 2022) is terminology, which 
refers to a specialized subset of concepts and vocabulary that typify a particular subject area. 
Indeed, Pinchuck (1977, 19) claims that vocabulary is the most significant linguistic feature 
S. Zupan, Z. Pavličič, M.L. Fabčič
173TRANSLATION STUDIES
of technical writing. Although Newmark (2008, 151) refuted that, claiming that terminology 
usually constitutes only 5-10% of the total content of technical texts, terminology remains an 
essential element of technical writing1. In contrast to other types of texts, technical texts often 
also visually distinguish themselves through multimodality, given that they include diagrams, 
graphs or photographs to complement the verbal text (Byrne 2014, 54).
In addition to the use of passive voice or the prevalence of the present tense, one prominent 
linguistic feature of technical texts is nominalization (Newmark 2008, 151; Olohan 2022, 
329). The frequency of nominal structures in technical texts is not surprising, given that 
technical writing strives for conciseness, and nominal structures deliver precisely that: lexical 
density. Regarding the discourse on science and technology, the phenomenon was analysed 
in detail by functional linguist Michael Halliday (2004). According to him, one result of 
the evolution of technical writing was that it helped organize grammar as a resource for 
generating meaning in metaphorical ways. This meant that items such as adjectives and 
verbs referring to “qualities” and “processes,” were first decoupled from their original lexical 
realizations, and then both meanings were recoupled through the new grammatical category 
of noun. One such example is the word length, which carries the quality of the adjective 
“long” (“quality”), but also belongs to the grammatical category of noun, i.e., the nominal 
meaning of “entity” or “thing.” Given that such words carry two category meanings, Halliday 
calls the phenomenon “grammatical metaphor.” The advantage of such structures is that it 
is possible to compress and combine multiple meanings into nominal phrases. On the other 
hand, this becomes a problem because overly concise and compressed structures sometimes 
become ambiguous or even incomprehensible (Byrne 2006, 83). One famous example is the 
phrase lung cancer death rates, which can mean anything from the number of deaths from 
lung cancer, on the one hand, to the amount of time in which patients with lung cancer 
die, on the other (Halliday 2004, 170). As Halliday’s example also shows, the problem of 
ambiguity is compounded when such phrases appear with little or no context, which is often 
the case with technical writing, which abounds in tables and illustrations.
In Slovenia, the field of technical translation in conjunction with machine translation 
remains under-researched, with most scholars focusing on other types of translation (e.g., 
Mezeg 2023; Orel Kos 2024).
3 Machine Translation
Machine translation (MT) automates the production of a target-language text from a source-
language text. Over the decades, scientists have worked on various approaches to MT (for an 
overview, see Naveen and Trojovský 2024; Araghi and Palangkaraya 2024). Previously, the 
two most recognizable ones were Rule-Based MT and Statistical MT. In recent years, however, 
neural machine translation (NMT) has become the most promising new venue, utilizing 
models loosely inspired by the human brain, which employ artificial neural networks (see, 
for example,  Zhang and Zong 2020). NMT translation involves two phases: encoding and 
decoding. During the encoding phase, each word in the source text is given a distinct neural 
1 In Olohan’s (2022) monograph Scientific and Technical Translation, for example, the terms terminology and 
terminological appear over one hundred times on 250 pages.
174 S. Zupan, Z. Pavličič, M.L. Fabčič Machine Translation of Independent Nominal Phrases in Technical Texts
representation or embedding. The word embeddings are subsequently combined to form a 
sentence-level representation. This process modifies the individual representations based on 
context, resulting in a contextualized interpretation. During the decoding phase, the sentence-
level representation is systematically broken down to produce the target sentence one word 
at a time. These two phases are carried out by interconnected artificial neural networks – the 
encoder and the decoder – together forming a unified network (Pérez-Ortiz et al. 2022).
NMT can generate non-existent target language words and fluent but inaccurate translations. 
Fluent output might mask these inaccuracies. Like other significant text dataset technologies, 
NMT can amplify training data biases. NMT systems require significant training time, 
computing power, energy, and specialized hardware (GPUs). They also demand massive 
training datasets unavailable for all language pairs (Kenny 2022). This reliance on automated 
solutions, however, may divert attention from the critical, in‐depth analysis of the source text 
– a drawback also observed by Koletnik Korošec (2011), who noted that unstructured use of 
machine translation can undermine thorough source text evaluation.
The present research used two publicly available, multilingual neural machine translation 
services, DeepL Translator and Google Translate.
DeepL Translator, like most translation systems, employs artificial neural networks for text 
translation. These networks undergo training on extensive datasets comprising millions 
of translated texts. DeepL’s website (How does DeepL Work 2021) reports numerous 
enhancements to the underlying neural network methodology. While most publicly available 
translation systems are direct modifications of the transformer architecture, and DeepL’s 
networks incorporate elements of this architecture, such as attention mechanisms, significant 
topological differences contribute to a reported substantial improvement in translation 
quality compared to the public research state of the art. A strong focus is placed on the 
targeted acquisition of specialized training data to enhance translation quality. This involves 
the development of specialized crawlers designed to locate and automatically assess the 
quality of translations available online. While public research typically employs supervised 
learning for network training, where the network iteratively compares generated translations 
with training data translations and adjusts weights based on discrepancies, DeepL reportedly 
utilizes additional techniques from other machine learning domains to achieve notable 
improvements. Training is conducted on networks with many billions of parameters. 
Emphasis is placed on efficient parameter utilization, enabling comparable translation quality 
to be achieved with smaller, faster networks. 
Two distinct language models are currently offered by DeepL for the translation of specific 
language pairs: a classic model and a next-generation model. The classic language model 
uses DeepL’s established AI neural network architecture for translation and is available for 
all supported languages. Over 800 language combinations are currently possible, including 
Slovene. DeepL Translator also supports translations into British English and American 
English. The next-generation language model is powered by a large language model (LLM) 
infrastructure. This LLM leverages extensive multilingual text corpora to address complex 
problems and is specifically trained for translation. Using proprietary LLMs within the 
next-generation model improves translation quality, particularly for longer texts. Specialized 
175TRANSLATION STUDIES
LLM infrastructure, uniquely tuned for language processing, facilitates more human-like 
translations, and reduces the risk of hallucinations and misinformation. Furthermore, unlike 
general-purpose models trained on publicly sourced internet data, DeepL’s next-generation 
model benefits from over seven years of proprietary data curated for translation and content 
creation. Currently, however, the next-generation language model does not support Slovene 
(“About the Next,” n. d.). 
Google Translate is the second publicly available multilingual neural machine translation service 
used in the present research. Like DeepL, it offers a website interface, mobile applications for 
Android and iOS, and an application programming interface (API). As of February 2025, it 
supports 249 languages and language varieties at various levels. Launched in April 2006 as a 
statistical machine translation service, it gathered initial linguistic data from United Nations 
and European Parliament documents and transcripts. Rather than direct translation, texts were 
initially translated to English and pivoted to the target language for most supported language 
combinations. In September 2016, Google’s research team announced the development of the 
Google Neural Machine Translation system (GNMT) to enhance fluency and accuracy. In 
November of the same year, Google Translate transitioned to GNMT. This system employed 
an extensive end-to-end artificial neural network utilizing deep learning. GNMT improved 
translation quality compared to statistical machine translation by employing an example-based 
machine translation (EBMT) method, learning from millions of examples. Whole sentences 
were translated at once rather than piecemeal. This broader context facilitated the identification 
of more relevant translations, subsequently rearranged and adjusted for improved grammatical 
accuracy and human-like fluency. Since 2020, GNMT has been phased out, and deep learning 
networks based on transformers have been implemented. 
Despite advancements in automated translation, Google’s engineers acknowledge that its 
quality remains imperfect, especially for low-resource languages. Even the latest models are 
susceptible to common machine translation errors, such as “poor performance on particular 
genres of subject matter (domains), conflating different dialects of a language, producing 
overly literal translations, and poor performance on informal and spoken language” (Caswell 
and Liang 2020).
4 Empirical Study
To evaluate machine translation, independent nominal phrases were compared with their 
respective machine translations. “Independent nominal phrases” in this paper refers to 
phrases that meet the following two criteria: 1) they have nouns as their heads; and 2) they 
are not an integral part of any sentence but instead appear on their own, outside any (explicit) 
syntactic structure, in technical texts typically in tables and illustrations. The text used in 
the analysis was a service repair manual for the diesel and gasoline Caterpillar forklifts of 
the GP and DP 15K, 18K, 20K, 25K, 30K, 35K series (Pub. No. 99719-60120), which 
were produced between the mid-1990s and 2007 (Caterpillar LPG n. d.)2. The original text 
was in English and was available in electronic form as a readable pdf document. The source 
2 The authors want to thank Darko Rihard and Marko Fajfar from Vilfis d.o.o. for their help with forklift truck-related 
terminology.
176 S. Zupan, Z. Pavličič, M.L. Fabčič Machine Translation of Independent Nominal Phrases in Technical Texts
texts were not additionally pre-formatted before translation. The complete manual comprised 
384 pages. For the study, the first thirty pages of the manual were machine-translated into 
Slovene using Google Translate (GT) and the professional (subscription-based) version of 
DeepL (DL). Next, the first one hundred subsequent independent units comprising nominal 
phrases from an illustration and a table on pages 1-2 to 1-5 were extracted and aligned with 
their two machine translations. Excluded were repetitions of identical phrases with identical 
translations. Most translation units were simple nominal phrases with single noun heads 
(e.g., front axle), including single-word phrases (e.g. mast), while a small number of other 
units comprised sets of (appositional) nominal phrases separated by parentheses, slash or 
colon (e.g., Kg/mm (lb/in.); tread (front/double tires); applicable truck model designation 35: 3.3 
ton class). These examples were treated as single translation units because they functioned as 
one unit of meaning. The original text included a few typos and grammatical errors, which 
remained uncorrected because the idea was to see how translation engines would deal with 
these. In total, the corpus of one hundred source units in English comprised a total of 305 
words and 139 lemmas (e.g., truck, trucks are two words, the base form of which (truck) 
corresponds to one lemma)3. Some phrases reappeared in identical form in the corpus several 
times: the most frequent, for example, was serial number, which recurred six times; four 
phrases (e.g., simplex mast; duplex mast) appeared four times; the rest had fewer recurrences. 
All one hundred source units were compared to their corresponding translations generated by 
the two translation engines and evaluated qualitatively and quantitatively.
In the absence of a specific model for describing translation shifts in acontextual nominal 
phrases, descriptors were adapted from other translatological models and theories, such as 
those by Leuven-Zwart (1989; 1990); Klaudy and Károly (2005); Toury (2012); and Krüger 
(2015). Following a preliminary comparison and analysis of various types of nominal phrases, 
the following descriptors were used to describe the relationship between the source and target 
translation units generated by the same translation engine:
No shift. Source and target phrases have equal or near-equal semantic, formal, and functional 
properties. Examples include phrases such as general information, whose Slovene translation 
splošne informacije is considered both a formal and functional equivalent of the English 
phrase. Other examples include target phrases that have several possible lexical varieties (e.g., 
vrsta motorja or tip motorja for engine type), all of which are considered adequate.
1.  Semantic shift. The semantic gap is too large to infer the meaning of the source 
phrase based on the translation. A typical example is the phrase duplex mast, which was 
translated as dvostranski jambor in Slovene. Although the head noun jambor corresponds 
to the English noun mast, it does so only in the context of sailboats; in the context of 
heavy machinery, however, the correct technical term in Slovene is jarem or teleskop. In 
addition, the adjective duplex, referring to the two stacks or sections of the mast that 
can be extended vertically, is translated as dvostranski. i.e., as two-sided, which likewise 
is a mismatch with the original meaning. The category also includes examples of made-
up translations, which in the context of artificial intelligence are popularly referred to 
3  The corpus was analysed with Sketch Engine (http://www.sketchengine.eu).
177TRANSLATION STUDIES
as hallucinations. One such interesting example is the word underclearnace (sic). As 
can be seen, the original technical term is misspelled and should have been spelled as 
underclearance, referring to the physical distance between the frame of the forklift and 
the ground below it. The translation engine, however, “translated” the original phrase as 
podnaprava in Slovene, which practically is an nonexistent noun in the Slovene lexicon, 
given that only one or two references could be found in relation to the sub-installation, 
but even those were from an unrelated domain of emission allowances.
2.  Terminological shift. Source and target phrases overlap semantically to the extent that 
the meaning of the original phrase can be inferred; however, the term used is a general or 
non-standard expression and not an established or standard technical term. An example 
of this type of shift is the phrase single wheels, referring to number of parallel wheels at 
the same end of the forklift truck axis. While GT used the correct technical equivalent 
enojna kolesa in Slovene, DL translated the phrase as posamezna kolesa, whose back-
translation is individual wheels. Although posamezna kolesa could apply in other contexts 
and makes it possible to decipher its meaning, the correct technical term to be used 
in this context is enojna kolesa. Another case in point involves examples of poor style, 
although the adverb nazaj is used. The category also includes examples of poor style such 
as poimenovanje, which appeared as the DL equivalent of the English phrase designation, 
referring to the type of forklift truck; a stylistically and terminologically better translation 
in Slovene would have been oznaka.
3.  Grammatical shift. Source and target phrases have different grammatical features. Given 
that English is an analytic language and Slovene as the target language a synthetic one with 
several inflectional morphemes, target phrases are expected to deviate from the source 
ones grammatically; also possible are grammatical disagreements (e.g., number, gender) 
within target phrases. One such example is the English phrase minimum intersecting isle, 
which was translated as najmanjša otok, where the feminine suffix -a in the attributive 
adjective disagrees with the masculine head noun otok – the morphologically correct 
version of the phrase would be najmanjši otok.
4.  Orthographic shift. Target text features orthographic shifts such as incorrect 
hyphenation, capitalization etc. One example is the abbreviation Ref. No., where both 
abbreviated words are capitalized in English. The first letters in the corresponding 
Slovene translation Ref. Št. likewise are capitalized; however, this conflicts with the rules 
of Standard Slovene, according to which lower case should have been used in the second 
abbreviation.
5.  Terminological inconsistency. The same source phrase is translated in various ways 
in the target texts. Included is every first iteration of a different translation. One such 
example in English is the phrase simplex mast, which in Slovene appears in three different 
translations: enostavni jambor; dvostranski (sic) drog; and as simplex jambor.
6.  No translation. In a small number of examples, no translation was provided, and the 
original source text phrase was reproduced in the target text, e.g.: the phrase [Mast] 
(square brackets used in the original text) appears as [Mast] also in Slovene (where the 
corresponding technical term is teleskop).
178 S. Zupan, Z. Pavličič, M.L. Fabčič Machine Translation of Independent Nominal Phrases in Technical Texts
The descriptors were not discrete categories excluding one another. In a small number of 
units, two descriptors were used for the same translation. The target phrase najmanjša otok, 
for example, included a semantic shift as well as a grammatical error because of a gender 
disagreement between the adjective and the noun in Slovene translation. Similarly, the phrase 
[podvozje] was an example of an adequate translation and thus marked as “no shift;” however, 
it was marked for terminological inconsistency, because the previous iteration of the same 
original phrase (chassis) was machine-translated by the same engine as šasija.
5 Results
Overall, the results showed that in most categories, both engines produced translations of 
comparable quality. No translation shifts were observed in 46% of translations generated by 
DL, while GT performed slightly better with 49% of units with no translation shifts. On the 
other hand, both engines generated a similar proportion of translations with semantic shifts: 
41% in DL vs. 42% in GT. GT performed slightly better in terms of terminological shifts, 
which were observed in 8% of all translations, while in DL translations, that proportion was 
22%. In contrast, no grammatical shifts were observed in DL generated phrases, while 4% of 
GT contained grammatical errors. Orthographic shifts were observed in 4% of both DL and 
GT translations. The only category with more noticeable discrepancies was terminological 
inconsistency, where no inconsistencies were observed in DL, whereas in GT, in 3% of the 
units the same technical term was translated in two or more ways. Only one percent of units 
remained untranslated by both translation engines. The distribution is presented in Table 1.
Table 1. Distribution of translation shifts.
N
o 
sh
ift
 (%
)
Se
m
an
tic
 sh
ift
 
(%
)
Te
rm
in
ol
og
ic
al
 
sh
ift
 (%
)
G
ra
m
m
at
ic
al
 
sh
ift
 (%
)
O
rt
ho
gr
ap
hi
c 
sh
ift
 (%
)
Te
rm
in
ol
og
ic
al
 
in
co
ns
ist
en
cy
 
(%
)
N
o 
tr
an
sla
tio
n 
(%
)
Deepl 46 41 12 0 4 0 1
Google 
Translate 49 42 8 4 4 3 1
As is the case with all quantitative data, the numbers show only part of the picture. The 
following is a detailed qualitative analysis of examples included in each of the seven categories.
5.1 No Shift
As indicated by the relative values, just under half of all translation units in the corpus showed 
no shifts, meaning that the phrases were considered functional equivalents of their source 
phrases. The analysis showed that this group of translation units could be divided into two 
subgroups. The first featured phrases which are considered common and are widely used 
in other more general contexts. A typical example is the phrase General information, which 
appeared four times in the source English text. In both translations, it was consistently 
179TRANSLATION STUDIES
translated as Splošne informacije. The absence of shifts comes as no surprise, given that this 
is the standard version of the phrase in Slovene, which appears in a variety of contexts even 
beyond technical writing, as indicated by over 600 instances of the phrase in the largest 
corpus of Written Standard Slovene, Gigafida 2.0. The same goes for other common phrases 
that also appear in non-technical writing such as serial number or dimensions, whose respective 
Slovene equivalents serijska številka and dimenzije also relatively frequently appear in non-
technical texts (cf. Gigafida 2.0). The engines also successfully dealt with a few terms that 
were considered more technical, such as output shaft, a mechanical part that connects the 
drive wheels and the gearbox, which was translated as izhodna gred by both Deepl and GT. 
However, it should be noted that neither of the translation engines translated the phrase as 
odgonska gred, which is another technical equivalent for the same mechanical part in Slovene.
5.2 Semantic Shifts
As evident from Table 1, over 40% of units in both translations included target text phrases 
whose meaning deviated from the source phrase to the extent that it made their understanding 
practically impossible. The analysis showed that most of these radical shifts were the result 
of two factors: 1) the ambiguity of phrases whose meaning is context-dependent; 2) the 
properties of technical language and terminology that typically are not part of the general 
vocabulary. In most cases, the translation engines struggled with the same units--however, 
not always.
A typical example of a unit whose meaning is context-dependent is the source noun truck. In 
the source manual, the noun is consistently used in the shorter, elliptical form of the longer 
phrase forklift truck, a standard expression for this type of industrial vehicle. However, in 
Slovene, the elliptical form poses a challenge for translation engines, given that the equivalent 
of truck in Slovene is tovornjak, which in turn is unrelated to forklift trucks; instead, tovornjak 
is a standard general Slovene term for a specialized vehicle for transporting freight. In 
translation, the meaning of the source unit thus changed radically. It is also interesting that in 
those sections of the English manual, where the complete phrase forklift truck was used, neither 
of the two translation engines had difficulties and consistently translated it as viličar, which is 
the Slovene equivalent of the phrase forklift truck; however, once the elliptical form appears, 
both translation engines struggled with its interpretation, regardless of the fact that the phrase 
appeared in longer full form elsewhere in the text. Lack of context also posed a problem in the 
one-word phrase reverse, which in this case referred to the travel speed for driving backwards. 
Although obratno, used in GT is a lexical meaning of the adverb, it does not fit the context; 
instead, the adverb used should have been vzvratno, as correctly identified by DL.
A similar problem appears with the source phrase free lift, referring to “the distance a forklift 
operator can raise the forks without extending the mast” (“What is a free lift,” n. d.). In 
both machine translations, however, the phrase turned into brezplačno dvigalo, which could 
be backtranslated as a free elevator and obviously bears no relation to the source phrase. The 
problem arose because the engines seemingly built the translation based on the headword 
lift in the source text. One of its lexical meanings is elevator; in turn, this most likely led to 
the use of an incorrect premodifier brezplačno, the Slovene lexical equivalent of the adjective 
180 S. Zupan, Z. Pavličič, M.L. Fabčič Machine Translation of Independent Nominal Phrases in Technical Texts
free, i.e. something requiring no monetary compensation. In the given context, of course, 
the phrase is out of place. Similarly, work performance, describing the properties of the truck, 
was translated incorrectly by both engines as delovna uspešnost; the latter is an established 
phrase in Slovene; however, not in the context of machinery but rather in labour relations. A 
similar problem appeared in relation to the English phrase transmission serial number, which 
seems straightforward as a designation of the serial number of the assembly connecting the 
engine and wheels. Neither of the translation engines struggled with the common phrase 
serial number and adequately translated it as serijska številka. However, both misinterpreted 
the premodifying noun transmission. It is true that one of its lexical meanings is that of prenos 
(in the sense of a transfer), which is used by both translation engines, but in this case, the 
meaning becomes misplaced, given that the phrase relates to the engine part.
5.3 Terminological Shifts
Terminological shifts were the third most common category of translation shifts. In contrast 
to semantic shifts, this category included translations that can be understood by readers 
but are terminologically inadequate because of failure to comply with standard technical 
terminology. Both translation engines had comparable results in this category, with DL 
outperforming GT by a small margin. As was the case with semantic shifts, terminological 
shifts in both translation engines commonly appeared in one-word phrases whose meaning 
was highly context dependent. An example of this the word items, which appeared on top of 
a column referring to the technical specifications of the forklift truck that are presented in 
the table. In DL, the noun was translated as elementi and in GT as predmeti. Although both 
are lexical equivalents of items, neither of the two translations fits the context; postavke would 
have been better. Similarly, poimenovanje (DL) and imenovanje (GT) are both close to one 
of the source phrase lexical meanings of designation; however, oznaka is considered a more 
adequate technical translation.
But multi-word units also posed a challenge. With some, the discrepancy was less noticeable 
than with others. A case in point is the phrase disassembly diagram, referring to the diagrams 
in the manual that show the order or relationship in which parts are disassembled. Both 
engines translated the phrase as diagram razstavljanja, whose meaning is likely to be clear 
to most speakers of Slovene, although the established technical term in Slovene is shema 
razstavljanja. Another example of a phrase that the engines struggled with was travel speed. As 
the translation hitrost potovanja shows, the confusion likely arose from the noun travel, whose 
basic lexical meaning in Slovene is that of potovanje; however, in this context, the resulting 
phrase, in conjunction with the head noun hitrost, refers more to the pace at which tourists 
enjoy their travels.
5.4 Grammatical Shifts
Unsurprisingly, there were few grammatical shifts. None were observed in the DL translations, 
and only four in the GT translations. One of those was an example of gender disagreement 
between the headword and its premodifier (najmanjša otok), while two displayed a 
grammatical case mismatch (obremenitev porazdelitev instead of porazdelitev obremenitve and 
Powershift menjalnik modeli instead of modeli z menjalnikom Powershift). The last shift featured 
181TRANSLATION STUDIES
the longer phrase overall height (to top of mast lowered), where GT failed to incorporate the 
participle lowered into the translation; in turn, the resulting phrase featured an incorrect use 
of the participle in the postmodifying position: skupna višina (do vrha jambora spuščen).
5.5 Terminological Inconsistencies
Terminological inconsistencies were also infrequent. In DL, none were observed, while GT 
featured three units where the same term was rendered in various ways in the translation. The 
first involved the noun disassembly, which appeared in three different phrases. In the first two, 
disassembly diagram and disassembly sequence, the noun was translated as razstavljanje; however, 
in the third iteration, suggestions for disassembly, the same noun appeared as demontaža, which 
typically is the standard technical term for the procedure described. The second term was 
chassis. When this noun appeared as part of a phrase, it was translated as šasija, which is an 
established technical term for chassis in Slovene. It is notable that in the third iteration, the 
noun appeared on its own as a single-word phrase and was translated as podvozje, which is a 
synonym for šasija in Slovene.
The third and most notable example was the noun mast, referring to one of the main forklift 
parts, the mechanical implement for lifting or lowering the load at the front of the vehicle. In 
the original, it appeared as part of fifteen different phrases. The first of those iterations was the 
mast serial number, in which mast is translated as jambor by both engines. Although jambor 
is a Slovene lexical equivalent of mast, it only applies in the context of sailboats; in forklifts, 
the corresponding technical term is jarm or teleskop. On the same page in the manual, mast 
also appears in the phrase chassis and mast model identification. In this instance, GT translated 
the noun as drog. In the remaining iterations in the GT translation, the noun varied again 
between drog and jambor; in DL, it was consistently translated as jambor.
5.6 No Translation
Both translations included only one unit that remained untranslated by both translation 
engines: the word [mast]. One plausible reason for this was the square brackets, which may 
have confused the engines and had them mistake the brackets for part of machine language.
Another group of items that remained untranslated were imperial units of measurement, 
which accompanied metric units of measurement in brackets, e.g., mm (in.). Given that 
metric units are standard in Slovene, the use of imperial alongside them was acceptable.
5.7 Miscellaneous
An interesting example includes translation of words with typos, which is not an uncommon 
phenomenon in texts. The case in point is the phrase underclearnace (at frame), referring to 
the distance between the chassis and the ground. DL translated it as podnaprava (v okvirju). 
Although the word sounds feasible in Slovene in terms of its form and morphological 
characteristics, it is not a common word and is hence an example of hallucination. 
Misinterpretation by DL manifests itself in the prepositional phrase at frame, which indicates 
the point at which the distance from the ground is measured, whereas Slovene translation 
182 S. Zupan, Z. Pavličič, M.L. Fabčič Machine Translation of Independent Nominal Phrases in Technical Texts
places that same point inside the frame, suggesting that the engine misinterpreted it. What is 
also notable that the same word remained untranslated by GT.
6 Conclusions
This study highlights the challenges of machine translation in handling independent nominal 
phrases in technical text. The comparison of Google Translate and DeepL into Slovene 
revealed both their strengths and their limitations in dealing with specialized terminology.
Nearly half the translated units showed no shift, indicating that common phrases were 
adequately rendered. However, semantic shifts were prevalent (over 40%), often due to 
ambiguity and lack of contextual information. Key issues included mistranslation of elliptical 
forms (e.g., truck instead of forklift truck) and misinterpretation of industry-specific terms like 
free lift and transmission serial number.
Terminological shifts affected precision, with general expressions replacing technical terms. 
While these translations were understandable, they lacked standard industry accuracy. 
Grammatical and orthographic shifts were minimal, with DeepL producing no grammatical 
errors and Google Translate showing minor inconsistencies. However, terminological 
inconsistencies in Google Translate indicated weaker consistency mechanisms compared to 
DeepL.
A small number of untranslated units, such as mast, suggest formatting-related processing 
issues in machine translation engines. The study underscores the importance of integrating 
domain-specific resources and human post-editing, as well as pre-editing and pre-formatting 
of texts to enhance translation reliability. This observation aligns with the findings of 
Hazemali et al. (2024), whose evaluation of chatbot performance in reading digitized texts 
showed that while the chatbot used in the study exhibited some success in handling typos 
and minor language errors, it produced only a 20% success rate in tasks demanding deeper 
language comprehension and struggled with complex sentence structure and domain-
specific terminology in Slovene. Experienced human translators typically do not miss 
phenomena such as repetition in the immediate textual vicinity; in addition, humans can also 
process graphic representations of information. While both translation engines performed 
comparably, future improvements should focus on context recognition and the handling of 
specialized terminology. These findings, alongside evidence from studies in other domains 
– such as Mohar, Orthaber and Onič (2020), who demonstrated that machine translation 
quality deteriorates with increasing sentence complexity in literary texts – underscore the 
need for ongoing refinement of MT systems to better handle both technical and stylistically 
rich content. It should be noted that technical translators, in contrast, for example, with 
literary translators, strive above all towards precision and comprehensibility “since the 
consequences of lexical error, however slight, are more serious: a poor literary translation leads 
to a dissatisfied reader, whereas a misleading technical translation could result in a hazard to 
human life” (Hann 1992, 7).
Further research should explore the impact of context on machine translation accuracy and 
investigate AI-driven enhancements for better translation consistency and precision.
183TRANSLATION STUDIES
References
“About the next-generation language model.” n. d. DeepL Help Center. https://support.deepl.com/hc/en 
-us/articles/14241705319580-About-the-next-generation-language-model.
Aixelá, Franco, J. 2004. “The study of technical and scientific translation: An examination of its historical 
development.” Journal of Specialised Translation 1. https://jostrans.soap2.ch/issue01/art_aixela.php.
Araghi, Sahar, and Alfons Palangkaraya. 2024. “The link between translation difficulty and the quality of 
machine translation: A literature review and empirical investigation.” Language Resources & 
Evaluation 58: 1093–1114. https://doi.org/10.1007/s10579-024-09735-x.
Byrne, Jody. 2006. Technical Translation: Usability Strategies for Translating Technical Documentation. 
Springer.
—. 2014. Scientific and Technical Translation Explained. Routledge.
Caswell, Isaac, and Bowen Liang. 2020. “Recent advances in Google Translate.” Google Research Blog, June 
8. https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html.
“Caterpillar LPG forklifts specifications.” n. d. Lectura specs. https://www.lectura-specs.com/en/specs/forkli 
fts/lpg-forklifts-caterpillar.
Galinski, Christian, and Gerhard Budin. 1993. “New trends in translation-oriented terminology 
management.” In Scientific and Technical Translation, edited by Sue Ellen Wright and Leland D. 
Wright, Jr. 209–16. John Benjamins.
Gigafida 2.0: Corpus of Written Standard Slovene. https://viri.cjvt.si/gigafida. 
Halliday, M.A.K. 2004. The Language of Science, edited by J. J. Webster. Continuum.
Hann, Michael. 1992. The Key to Technical Translation. Volume 2: Terminology/Lexicography. John 
Benjamins.
Hazemali, David, Janez Osojnik, Tomaž Onič, Tadej Todorović, and Mladen Borovič. 2024. “Evaluating 
chatbot assistance in historical document analysis.” Moderna arhivistika 7 (2): 53–83. https://doi.org 
/10.54356/ma/2024/biub3010. 
Herman, Mark. 1993. “Technical translation style: Clarity, concision, correctness.” In Scientific and 
Technical Translation, edited by Sue Ellen Wright and Leland D. Wright, Jr., 11–20. John Benjamins. 
“How Does DeepL Work? #Network Architecture.” 2021. DeepL Blog, November 1. https://www.deepl 
.com/en/blog/how-does-deepl-work#network_architecture.
Kenny, Dorothy. 2022. “Human and machine translation.” In Machine Translation for Everyone: 
Empowering Users in the Age of Artificial Intelligence, edited by Dorothy Kenny, 24–46.  Language 
Science Press.
Kingscott, Geoffrey. 2002. “Technical translation and related disciplines.” Perspectives 10 (4): 247–55. 
https://doi.org/10.1080/0907676X.2002.9961449. 
Klaudy, Kinga, and Krisztina Károly. 2005. “Implicitation in translation: Empirical evidence for 
operational asymmetry in translation.” Across Languages and Cultures 6 (1): 13–28. https://doi.org/10 
.1556/Acr.6.2005.1.2.
Koletnik Korošec, Melita. 2011. “Applicability and challenges of using machine translation in translator 
training.” ELOPE: English Language Overseas Perspectives and Enquiries 8 (2): 7–18. https://doi.org/10 
.4312/elope.8.2.7-18.
Krüger, Ralph. 2015. The Interface between Scientific and Technical Translation Studies and Cognitive 
Linguistics with Particular Emphasis on Explicitation and Implicitation as Indicators of Translational Text-
Context Interaction. Frank & Timme.
Leuven-Zwart, K. M. v. 1989. “Translation and original: Similarities and dissimilarities I.” Target 1 (2): 
151–81.
—. 1990. “Translation and original: Similarities and dissimilarities II.” Target 2 (1): 69–95.
Mezeg, Adriana. 2023. “Ali sploh še potrebujemo prevajalce? Strojno prevajanje iz francoščine v 
slovenščino.” Ars & Humanitas 17 (1): 139–54. https://doi.org/10.4312/ars.17.1.139-154.
Mohar, Tjaša, Sara Orthaber, and Tomaž Onič. 2020. “Machine translated Atwood: Utopia or dystopia?” 
ELOPE: English Language Overseas Perspectives and Enquiries 17 (1): 125–41. https://doi.org/10.4312 
/elope.17.1.125-141.
184 S. Zupan, Z. Pavličič, M.L. Fabčič Machine Translation of Independent Nominal Phrases in Technical Texts
Naveen, Palanichamy, and Pavel Trojovský. 2024. “Overview and challenges of machine translation for 
contextually appropriate translations.” iScience 27 (10): 110878. https://doi.org/10.1016/j.isci.2024 
.110878.
Newmark, Peter. 2008. A Textbook of Translation. Twelfth impression. Longman.
Olohan, Maeve. 2016. Scientific and Technical Translation. Routledge.
—. 2022. “Translating technical texts.” In The Cambridge Handbook of Translation, edited by Kirsten 
Malmkjær, 321–39. Cambridge University Press.
Orel Kos, Silvana. 2024. “Introduction of machine translation into audiovisual translation 
teaching.” ELOPE: English Language Overseas Perspectives and Enquiries 21 (1): 185–208. https://doi 
.org/10.4312/elope.21.1.185-208.
Pérez-Ortiz, Juan Antonio, Mikel L. Forcada, and Felipe Sánchez-Martínez. 2022. “How neural machine 
translation works.” In Machine Translation for Everyone: Empowering Users in the Age of Artificial 
Intelligence, edited by Dorothy Kenny, 141–64. Language Science Press.
Pinchuck, Isadore. 1977. Scientific and Technical Translation. André Deutsch.
Toury, Gideon. 2012. Descriptive Translation Studies and Beyond. Benjamins.
“What is a free lift on a forklift?” n. d. American Forklifts. https://americanforklifts.org/what-is-a-free-lift 
-on-a-forklift/.
Zhang, JiaJun, and Chengqing Zong. 2020. “Neural machine translation: Challenges, progress and 
future.” Science China Technological Sciences 63: 2028–50. https://doi.org/10.1007/s11431-020-1632 -x.
185TRANSLATION STUDIES
2025, Vol. 22 (1), 185-201(228)
journals.uni-lj.si/elope
https://doi.org/10.4312/elope.22.1.185-201
UDC: [811.111’373.612.2:81’25]:[004.89:378]
Marija Brala Vukanović
University of Rijeka, Croatia
Translating (Metaphors) in the Age of AI: 
Opportunities, Challenges, and Implications  
for the EFL Classroom
ABSTRACT
The paper explores the use of AI translation tools in EFL classrooms, focusing on metaphor 
translation. We investigate the attitudes of first- and third-year English students at the 
University of Rijeka, Croatia, towards AI tools and evaluate three platforms: Google Translate, 
ChatGPT, and Glosbe, regarding their ability to accurately translate metaphors. The findings 
show a generally positive student disposition towards AI tools but also highlight frequent 
inaccuracies in AI-generated metaphor translations. We discuss the implications of these 
results for EFL teaching, emphasizing the potential value of error correction as a pedagogical 
tool. Our analysis suggests that the limitations of AI tools can serve as valuable pedagogical 
resources for fostering critical engagement, improving students’ understanding of culturally 
and contextually impregnated language, and enhancing their linguistic skills. Our findings 
underscore the need for an urgent and systematic integration of AI tools into classrooms.
Keywords: AI in education, machine translation, metaphors, error correction, English as a 
Foreign Language (EFL)
Prevajanje (metafor) v dobi umetne inteligence: priložnosti, 
izzivi in posledice za učilnico angleščine kot tujega jezika
IZVLEČEK
V članku raziskujemo uporabo prevajalskih orodij, ki temeljijo na uporabi umetne inteligence 
(UI), pri pouku angleščine kot tujega jezika (EFL) s poudarkom na prevajanju metafor. 
Preučujemo stališča študentov in študentk prvega in tretjega letnika angleščine na Univerzi 
na Reki na Hrvaškem do orodij UI ter ocenimo zmožnosti natančnega prevajanja metafor na 
platformah Google Translate, ChatGPT in Glosbe. Ugotovitve kažejo na splošno pozitiven 
odnos študentov in študentk do uporabe orodij UI, a hkrati v raziskavi izstopa tudi pogosta 
netočnost pri prevodnih metaforah, ki jih prevede UI. Razpravljamo o pedagoških posledicah 
teh rezultatov, pri čemer poudarjamo didaktični potencial popravljanja napak kot učnega 
pristopa. Ugotavljamo tudi, da omejitve orodij UI lahko predstavljajo dragocena izhodišča 
pri pouku, saj spodbujajo kritično razmišljanje, pomagajo pri razumevanju kulturno in 
kontekstualno zaznamovanega jezika ter prispevajo k izboljšanju jezikovnih spretnosti. 
Ugotovitve podpirajo nujnost sistematičnega vključevanja orodij UI v pedagoški proces.
Ključne besede: umetna inteligenca v izobraževanju, strojno prevajanje, metafore, 
popravljanje napak, angleščina kot tuji jezik (EFL)
186 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ...
1 Introduction
Contemporary science in general, and cognitive linguistics in particular, embrace the view 
that human experience – and, by extension, human language – are profoundly conditioned 
and shaped by the human body, perception, and culture. More broadly, the relativistic 
view that language both reflects the conceptual structure of the speaker and simultaneously 
influences cognitive processes to some degree has been well-established and validated within 
the field (Whorf 1956; Lakoff and Johnson 1980). According to this perspective, language 
not only mirrors human thought but also actively contributes to shaping it.1 
In the specific context of how language reflects and influences cognition, metaphors are often 
regarded as pivotal elements that guide thought processes, serving as “switching points” on 
the rail junctions of our ideas (Lakoff and Johnson 1980).
Most fluent speakers of English know that the phrases ‘switching point’ and ‘rail junction’ used 
in the previous sentence are not to be understood literally – i.e., as referring to a pivotal point 
at a rail junction where trains can go one direction or another. Instead, these two expressions 
are expressive devices of language that describe metaphors as having the power to direct our 
ideas into one among the many possible options that our cognitive, or thought processes can 
create. In fact, in the sentence under scrutiny, we have just resorted to metaphors to explain 
what they are. In other words, we have compared the thought process to a train journey, and 
tried to illustrate the capacity of metaphors to direct, redirect and shape our ideas in a certain 
way (rather than some other possible one) by comparing metaphors to switching points on rail 
junctions that can direct the train (i.e., thought) along different paths. Our aim was to render 
– as clearly as possible – the idea that the thought process can be strongly directed by using 
metaphors. In fact, as illustrated by our example, metaphors are linguistic tools that make us 
understand one thing (usually a more abstract one – in our case the cognitive process) in terms 
of another (usually a simpler, more ‘accessible’ one – in our case the train travelling on rail 
tracks that can and do go in different directions by being directed at switch-points). 
If we now try to translate the sentence ‘Metaphors are switching points on the rail junctions 
of our ideas’ using a few AI tools,2 we get the following:
a) ChatGPT:
Metafore su “točke prebacivanja” na željezničkim čvorištima naših ideja. (literal 
translation with inadequate lexical selection of ‘transfer points’ and ‘rail nodes’/’railway 
crossing’).
1 See, e.g., the work done in the past few decades by the interdisciplinary Language and Cognition Group (now 
subdivided into multiple groups) of the Max Planck Institute for Psycholinguistics in Nijmegen, on how language 
influences perception, categorization, and conceptualization. For more information and references, visit https://www.
mpi.nl. For a further range of studies on the interplay between linguistic structures and cognitive processes (e.g., the 
interplay between language and the perception of space and time), see the extensive body of work by Lera Boroditsky 
(e.g., Boroditsky and Gaby 2010). For recent work, see Maier and Abdel Rahman (2024). 
2 These three AI translation tools – Google Translate, ChatGPT, and Glosbe – were chosen because they are the most 
commonly used tools in Croatia, among both students and professional translators (for more details, see Section 2 
below). Google Translate and Glosbe use machine learning algorithms, and ChatGPT relies on advanced language 
models (like GPT) that involve deep learning.
Translating 
(Metaphors) in 
the Age of AI: 
Opportunities, 
Challenges, and 
Implications for ...
187TRANSLATION STUDIES
b) Google translate: 
Metafore su “sklopne točke” na željezničkim raskrižjima naših ideja. ((literal translation 
with inadequate lexical selection of ‘switch points’ and ‘rail nodes’/’railway crossing’).
c) Glosbe:
Metafore su prekretnice na željezničkim raskrižjima naših ideja. (literal translation 
with inadequate lexical selection of ‘turning point / milestone’ and ‘rail nodes’/’railway 
crossing’).
All three translation versions are too literal and prove to be contextually and culturally 
inappropriate in the target language. In all three cases, the machine translation tool yields 
an attempt to render lexical accuracy by proposing a literal translation of what is recognized 
as (railroad) technical terms – namely switching point and rail junction, (rendered into 
Croatian – in the case of the English ‘switching point’ – as ‘transfer point’ in a), ‘switching 
point’ in b), and ‘turning point / milestone in c), while all three tools render ‘rail junctions’ 
as ‘rail node’ / ‘railway crossing’). At the same time, neither of the three translation versions 
manages to render the pragmatic value of the source language metaphoric expressions, thus 
failing to convey the message. All three versions have problems with the two metaphors 
in the source language sentence (metaphors seen as ‘switching points’ and viewing ‘the rail 
junctions’ of ideas).
If we now turn to student translations, we get the following:
d) Translation by students:3
Metafore su skretničari na raskrižjima naših misli. (literally, ‘Metaphors are the 
switchmen at the junctions of our thoughts’).
Immediately, we note a stark contrast between the AI-generated translations and the human 
translation. While the former are overly literal and culturally inappropriate, the student 
version, characterized by a degree of creative liberty (i.e., a slight departure from the source 
language), functions effectively within the context and is perfectly adapted to the target 
language and culture.
Building on these observations, and within the broader academic discussion surrounding the 
use of AI tools by English as a Foreign Language (EFL) students and translators (Gašpaović 
et al. in prep.), as well as the challenges faced by AI in idiomatic translation (Gašpaović et al. 
in prep.), in this paper we investigate the possibilities and challenges associated with metaphor 
translation by AI tools. The issue is explored within the larger, applied context of AI tool 
usage in the EFL classroom. One of our main aims is to highlight the growing and urgent 
need to explore and standardize possible applications of AI in the classroom, starting from a 
detailed understanding of AI and its potential in language learning. The paper is structured 
as follows: after a brief literature review, we introduce the study, outlining the methodology 
and results. These results are then discussed, and the implications for AI translation tools in 
3 All the AI translations of metaphors under investigation in this study were also analysed by the students who 
completed the questionnaire, post questionnaire completion, and the translation proposed as ‘translation by the 
students’ refers to the version that was agreed upon by the students in class as the best translation option.
188 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ...
general – specifically in the context of metaphor translation – and their integration into EFL 
classrooms are addressed. The study concludes with a discussion of the current landscape of 
AI in pedagogy.
2 Setting the Stage
It is indisputable that the world is currently undergoing a profound digital transformation 
across all sectors. In terms of both speed and scale, this change can be described as tectonic. 
A central driving force behind this transformation is Artificial Intelligence (AI), defined here 
as the simulation of human intelligence processes by computer systems (see Healey 2020).
The three translation tools under scrutiny in this paper – Google Translate, ChatGPT, and 
Glosbe – are all powered by information technology (IT) and are thus frequently referred to in 
the literature as IT translation tools. However, in this study, we refer to them as AI translation 
tools because our focus is specifically on their capabilities in terms of how they process and 
generate language – that is, how these systems can perform tasks typically requiring human 
intelligence, such as language understanding, generation, contextualization, and cultural 
adaptability.
While language processing is one of the most rapidly advancing fields within AI, and AI is 
increasingly infiltrating various aspects of life, one surprising area where AI adoption appears 
to be lagging is in language didactics, with the EFL (English as a Foreign Language) classroom 
serving as a notable example. A considerable number of teachers still have a double-edged 
sword attitude toward the implementation and integration of AI tools into the foreign 
language teaching process (this issue is discussed in detail in Section 4 below).
Numerous recent studies suggest that the integration of AI tools into second/foreign language 
teaching should be viewed not merely as a passing trend but as an urgent and growing need 
(Crompton, Edmett, and Ichaporia 2023; Edmett et al. 2023; Vogt and Flindt 2023). As 
highlighted in our literature review, while the potential of AI is widely recognized by both 
students and, to a certain extent, teachers, its application in classroom practice remains 
limited. One region that seems to be reversing this trend is Asia (Crompton, Edmett, and 
Ichaporia 2023). Meanwhile, most countries within the European Union are still awaiting 
national policy guidelines on the issue, with notable exceptions such as Sweden (Musk 2022) 
and the UK (Edmett et al. 2023). A recent study by Vogt and Flindt (2023) demonstrates 
that even low-threshold AI tools have been integrated into classroom practice in a limited and 
hesitant manner, with a general tendency for such tools to be “ignored and excluded from 
language teaching” (2023, 2). Furthermore, the integration of AI tools into EFL classrooms 
remains underexplored, despite their clear importance (Crompton, Edmett, and Ichaporia 
2023; Vogt and Flindt 2023).
The above implies that we are neglecting the vast potential applications of AI tools in foreign 
language classrooms. AI-supported tools, particularly translation tools, can assist students 
to improve their language skills by offering instant translations, exposing them to diverse 
language structures, and providing immediate feedback (Dizon and Gayed 2021; Farrokhnia 
et al. 2023; Schmidt and Strasser 2022). Moreover, AI tools can provide personalized learning 
189TRANSLATION STUDIES
experiences, adapting to the individual needs of students and offering resources tailored to 
their specific learning levels (Okolo et al. 2024). AI chatbots like ChatGPT, for instance, 
can engage students in conversational practice, enhancing their fluency and comprehension 
(Crompton, Edmett, and Ichaporia 2023; Kazu and Kuvvetli 2023). Finally, AI tools can 
create innovative and stimulating learning environments and contexts, such as through 
virtual reality (Chen et al. 2022).
True, the vast potential of integrating AI into EFL teaching comes with certain limitations. 
Setting aside the many ethical issues, which are outside the scope of this paper, one of the 
most significant challenges in integrating AI tools into foreign language pedagogy lies in 
the pragmatic and culturally embedded nature of language, particularly regarding idiomatic 
expressions and metaphors. As illustrated in the introduction, AI tools are not equipped to 
provide the meaningful, context-based, and possibly culture-specific interpretations necessary 
for appropriately translating or explaining culturally and/or contextually loaded phrases (see 
also Naveen and Trojovský 2024). These tools often rely on literal translations, overlooking 
important nuances such as tone, cultural implications, and the pragmatic functions of 
language in real-world contexts. To reiterate our initial point, metaphors – acting as cognitive 
and cultural connectors in language – present a significant challenge for AI translation tools.
One of the main messages we wish to convey is that this limitation need not be viewed solely 
in a negative light; instead, it can be regarded as a potentially valuable pedagogical tool. When 
considering the issue in the context of the critical role of error correction in the EFL classroom 
(see Khansir and Pakdel 2018), the potential of AI translation tools emerges as far more 
valuable than initially apparent. These tools are useful pedagogical resources not only when 
they provide accurate translations, but also when they fail to do so, since these failures can 
offer powerful opportunities for teaching about culture and context-specific elements, critical 
evaluation of translations, and overall learning. This is the central argument we will further 
explore in the discussion below. Before delving into that, in the central part of this study, 
we will review our research examining student attitudes and habits regarding the use of AI 
translation tools, as well as the performance of these tools in translating metaphorical language.
3 The Study
Given that students, teachers, and translators alike face numerous challenges related to the use 
of AI translation tools in their everyday work, we decided to explore some of these pressing 
issues in greater detail and, at the very least, provide a more comprehensive framework for their 
future investigation. The study presented in this paper is motivated by the following questions:
1. What are the most widely used translation AI tools among Croatian students of 
English?
2. Which opportunities and challenges do students recognize regarding the use of these 
tools for translation purposes?
3. Are students actively encouraged to use AI tools in their translation work, and/or 
provided adequate guidance in this respect?
4. What guidance would they give to users of AI translation tools?
190 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ...
5. How do the most widely used AI tools cope in the translation of metaphors?
6. What are the implications of all the above for EFL classrooms?
The first four questions were investigated using a questionnaire. To address the fifth question, 
twenty metaphorical expressions – both cross-linguistically transparent and opaque – were 
run through the most popular AI translation tools (Google Translate, ChatGPT, and 
Glosbe), and their English-to-Croatian and Croatian-to-English translations were evaluated 
in terms of accuracy and cultural appropriateness. The final, sixth question is explored in the 
discussion section of the paper, drawing on the findings from the first five research questions.
3.1 Methodology
This study employed a mixed-methods approach, integrating both qualitative and quantitative 
techniques. The initial stage involved a comprehensive literature review, focusing on existing 
scholarly work related to the implementation of AI tools in EFL classrooms in general, and 
specifically for the purposes of metaphor translation. The goal of this stage was not only to 
synthesize existing scholarly discussions but also to assess the level of awareness and identify 
current areas of interest. This review served as both a stimulus and a foundation for the 
subsequent phases of the research. As noted in the previous section, this phase highlighted 
the fact that the scholarly awareness regarding the need for and ways of integrating AI into 
EFL classrooms, particularly in metaphor translation, is extremely limited and insufficient.
The next phase involved administering a mixed-format questionnaire (see Appendix), which 
included both closed and open-ended questions. This questionnaire was used to collect 
primary data and assess the current situation regarding the use of AI tools by students in their 
everyday activities, both in and outside the classroom. The survey was completed by seventy-
two university students enrolled in the undergraduate English program at the University 
of Rijeka, from two different years of study: forty-four students were in their first year, and 
twenty-eight were in their third year. The closed-ended questions aimed to gather quantitative 
data on the frequency of AI tool usage, exposure to these tools, and preferences for specific 
AI translation tools. The open-ended questions were designed to collect qualitative responses 
regarding participants’ views on various aspects of AI translation tools in the context of EFL 
teaching and learning.
The final stage of the study involved a controlled experiment in which a set of metaphors, 
ranging from cross-linguistically transparent (i.e., having lexical and pragmatic translational 
equivalents) to cross-linguistically opaque (no straight one-to-one translational matches), 
were translated from English to Croatian and vice-versa using several AI translation tools 
(Google Translate, ChatGPT, and Glosbe). The translations were evaluated one by one 
based on accuracy, fidelity to the source meaning, and cultural appropriateness of the target 
expression. The results of these translations were subsequently discussed in class.
3.2 Results 
The results are presented below in the order of their appearance on the questionnaire. The 
responses to the quantitative questions are reported and displayed in graphs, while the 
191TRANSLATION STUDIES
answers to the qualitative questions are organized and presented based on the frequency of 
recurring response themes.
3.2.1 Use of AI Translation Tools (Question 1)
Both first-year and third-year students reported using AI translation tools in their everyday 
life and academic work. As a result, the answer to Question 1, which asked whether students 
used AI translation tools, was 100% “yes” in both groups.
3.2.2 Most Widely Used AI Translation Tools (Question 2)
When asked about the most widely used AI translation tools among Croatian students of 
English at the University of Rijeka, the findings revealed significant differences between first-
year and third-year students in their software preferences.
•	 First-Year Students: Among the first-year students (n = 44), Google Translate 
emerged as the overwhelmingly preferred translation tool, with 42 students reporting 
its use. Other tools, such as ChatGPT (8), Glosbe (5), and DeepL (2), were used to 
a much lesser extent.
•	 Third-Year Students: In contrast, third-year students (n = 28) favoured Glosbe 
(24), with Google Translate (17) as the second most popular option. Additionally, 
online dictionaries (7), DeepL (6), and ChatGPT (4) were also commonly used, 
while Eudict and Reverso each had one user.
We note that the total number of responses for this question exceeds the number of students 
who completed the questionnaire. This is because many students reported using multiple 
translation tools, with this behaviour being more prevalent among third-year students.
Table 1. Preferred translation tools.
Translation Tool First-Year Students (n = 44) Third-Year Students (n = 28)
Google Translate 42 17
Glosbe 5 24
ChatGPT 8 4
DeepL 2 6
Online Dictionaries 0 7
Eudict 0 1
Reverso 0 1
3.2.3 Guidance on AI Translation Tools (Questions 3 and 4)
The responses to Questions 3 and 4 reveal a notable contrast between first-year and third-year 
students regarding guidance on AI translation tools.
•	 First-Year Students: The majority of first-year students (n=44) reported that they had 
not received any guidance on the availability or use of AI translation tools during the 
192 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ...
first three months of their university education, or prior to that.4 This lack of guidance 
was reflected in their negative responses.
•	 Third-Year Students: In contrast, the third-year students (n = 28) were divided, with 
7 students indicating that they had received guidance on AI translation tools at the 
university level. This was an unexpected finding, but further investigation revealed that 
all seven of these students had German as their second major. As part of their German 
courses, they were introduced to Glosbe, which they subsequently shared with their 
classmates, explaining the high popularity of Glosbe among third-year students.
3.2.4 Benefits of AI Translation Tools (Question 5)
Regarding the benefits of using AI translation tools, students from both first and third years 
highlighted similar advantages, with speed and the multiplicity of options being the most 
frequently mentioned. Other noted benefits include the following:
•	 Simplicity and ease of use
•	 The availability of alternatives, allowing users to choose the best solution
•	 Tools that sometimes remind users of the appropriate translation
•	 Vocabulary expansion
•	 Potential as a learning tool
Since the advantages identified by first-year and third-year students did not differ significantly, 
the results have been combined into one graph below.
3.2.5 Challenges in Using AI Translation Tools (Question 6)
When it comes to challenges that may hinder the use of AI translation tools, the respondents 
were almost unanimous in identifying accuracy and issues related to (de)contextualization 
as major concerns. Specific challenges mentioned included the following:
•	 Problems with idioms and metaphors
•	 Issues with collocations
•	 Difficulty handling cultural allusions
•	 Struggles with the pragmatic aspects of language, such as colloquial expressions, 
proverbs, and personal perspectives
Additionally, some students mentioned less common challenges, such as the following:
•	 Problems with syntax
•	 Insufficient data for Croatian, resulting in poorer performance when translating from 
or into Croatian, compared to “major” languages
4 The questionnaire was administered in late December 2024, by which time first-year students had completed three 
months of lectures, as the academic year begins in October. In terms of guidance on the use of AI translation tools, 
respondents were instructed to consider any form of guidance they had received, including both during their time at 
university and prior to enrolment, at any level of education.
193TRANSLATION STUDIES
•	 A lack of habit in using physical dictionaries and other literature
•	 A potential narrowing of creativity and autonomy owing to reliance on IT tools
No significant differences were noted between the first- and the third-year respondents.
3.2.6 Use of IT Tools for Translating Metaphors (Question 7)
Regarding whether students would use IT tools for translating metaphors, the responses 
revealed a clear ‘no’ from the first-year students, and a more divided response from the third-
year students.
•	 First-Year Students: All 44 first-year students (100%) stated that they would not use 
IT tools for translating metaphors.
•	 Third-Year Students: Among the 28 third-year students, 21 (75%) preferred not to 
use IT tools for translating metaphors, while 7 (25%) were in favour of using them.
It is important to note that while first-year students simply answered ‘no’ to the question of 
using IT tools for metaphor translation, third-year students often elaborated on both their 
‘yes’ and ‘no’ responses. Positive responses were frequently qualified with phrases such as “Yes, 
but…” or “Yes, if…”. Common elaborations included the following:
•	 “Yes, but with caution/guidance/care/a pinch of salt.”
•	 “Yes, if I cannot think of an equivalent myself / if I am unfamiliar with the source 
language expression.”
•	 “Yes, but always double-checking the proposed translation with authentic target 
language data, teachers, or native speakers.”
On the other hand, most negative responses were followed by explanations that reiterated the 
limitations of AI translation tools, particularly when dealing with the idiomatic, metaphoric, 
pragmatic, and cultural aspects of translation – issues already highlighted in the responses to 
Question 6 (above).
3.2.7 Guidance for Using AI Translation Tools as Future EFL 
Teachers (Question 8)
Finally, when it comes to the guidance that first- and third-year students, as future English 
as a Foreign Language (EFL) teachers, would give to their students regarding the use of AI 
translation tools, the responses were grouped based on common themes, which were then 
ranked according to their frequency. The most frequent themes identified are listed below, 
from most to least common:
First-Year Students:
•	 Use IT tools as a help rather than relying exclusively on them
•	 Always double-check the solutions proposed by IT tools, as they can make many 
mistakes
194 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ...
•	 Compare different sources
•	 First, study and understand the language, and only then use IT tools
•	 Use with great caution or moderation; do not overuse them (they will make you 
“dumb”)
Responses that were unique or only appeared once include:
•	 Do not use for sentences, only for words
•	 Do not use them, as they are above your level
•	 Use them because they can help you better understand a text in a foreign language
•	 Use them only as an exercise in recognizing mistakes commonly made by others
Third-Year Students:
•	 Use IT tools, but do not abuse them; they are just a tool, a help, support
•	 Use more than one IT tool and check the different solutions; triple fact-check; 
compare and contrast different solutions; always remember that these tools make 
mistakes all the time
•	 Do not use IT tools if your knowledge is poor
•	 Learn to think critically about the proposed translations
•	 First, learn to work without their help
•	 I would discourage their use; use as a last resort
•	 If you get paid for a translation, you must do it yourself – anyone can feed a source 
language into a tool, but that is not the point
•	 Best for specific/technical vocabulary
We immediately observe that the responses from third-year students reveal more pedagogically 
oriented comments compared to those from first-year students. While first-year students 
mainly focus on cautionary advice, such as using the tools sparingly and cross-checking 
results, third-year students demonstrate a deeper understanding of the broader educational 
implications of IT tool usage. They emphasize critical thinking, the importance of building 
foundational knowledge before relying on tools, and the need to use these resources as a last 
resort or for specific tasks. This shift toward more pedagogically sound advice reflects their 
growing awareness of the role of an EFL teacher and their understanding of how to guide 
students effectively in the classroom. This topic will be discussed in more detail in the next 
section.
3.2.8 Accuracy of AI Translation Tools in Translating Metaphors
In order to explore the level of accuracy demonstrated by a selected set of AI tools in 
translating metaphors, we created two sets of metaphorical expressions: one containing 10 
cross-linguistically transparent metaphoric expressions (i.e., metaphorical phrases that have 
lexical and pragmatic equivalents in both languages), and the other set containing 10 cross-
linguistically opaque metaphoric expressions (i.e., metaphorical phrases with no direct one-
to-one translational equivalents). These two sets are exemplified below.
195TRANSLATION STUDIES
Cross-linguistically transparent metaphorical phrases:
1.  A double-edged sword – Mač s dvije oštrice (meaning: something that has both positive 
and negative consequences or effects);
2.  The tip of the iceberg – Vrh ledenog brijega (meaning: what is visible or known is just a 
fraction of the whole);
3.  A wolf in sheep’s clothing – Vuk u janjećoj koži5 (lit. ‘Wolf in lamb’s skin’ – meaning: 
someone who appears to be harmless or trustworthy is actually dangerous and 
deceitful)
4.  To be on cloud nine – Biti na devetom nebu (meaning: extremely happy or in a state of 
bliss).
Cross-linguistically opaque metaphorical phrases:
1.  A silver lining – U svakom zlu neko dobro (lit. ‘In every evil, there is some good’) 
(meaning: a positive or hopeful aspect to a generally negative or difficult situation)
2.  Caught between a rock and a hard place – Između čekića i nakovnja (lit. ‘Between the 
hammer and the anvil’) (meaning: be caught in a position where one is pressed from 
two sides, with no easy way out);
3.  Like two peas in a pod – Kao dvije kapi vode (lit. ‘Like two drops of water’) (meaning: 
two things or people that are extremely similar or identical)
4.  M iss the boat / the ship has sailed – prošla baba s kolačima (lit. ‘the grandmother with 
the cakes has passed/left’) / Prošao voz (literally, ‘the train has passed’) (meaning: an 
opportunity has been missed, and it is too late to act now)
All expressions from both sets were translated from English to Croatian and vice-versa 
using three AI translation tools: Google Translate, ChatGPT, and Glosbe. All translations 
were first evaluated by the teacher for accuracy, fidelity to the source meaning, and cultural 
appropriateness of the target expression. The results of these translations were subsequently 
discussed with the students in class.6
The results we obtained when translating our two sets of metaphors align closely with the 
findings presented by Gašparović, Brala-Vukanović, and Brkić-Bakarić (in prep). In fact, the 
accuracy of translations of expressions that contain metaphorical language (either metaphorical 
descriptions or conventional metaphors embedded in idioms, proverbs, etc.) seems to 
be influenced by three key factors: a) the degree of equivalence between source and target 
metaphoric expressions; b) the source and target language pair; and c) the translation platform.
More specifically, in terms of platform performance, ChatGPT demonstrates a higher level 
of translation accuracy compared to Google Translate, which, in turn, outperforms Glosbe. It 
5 To give a good example of overliteral translation, let us note here that albeit both English and Croatian use this idiom, 
and the metaphorical mapping between good and evil is clear and frequently used, ChatGPT translates the English 
expression ‘wolf in sheep’s clothing’ as ‘wolf in sheep’s skin’, mistaking the Croatian culturally usual ‘lamb’s skin’ with 
the literal translation from English, i.e. ‘sheep’. 
6 The evaluations of the translations in class were done after the students had completed the questionnaire. 
196 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ...
is worth noting that the study by Gašparović, Brala-Vukanović, and Brkić-Bakarić (in prep) 
indicates that Microsoft Copilot surpasses ChatGPT in translation accuracy. However, since 
our participants did not use Microsoft Copilot, it was not included in this study.
Additionally, translation accuracy tends to be higher when translating from Croatian to 
English rather than from English to Croatian. This is unsurprising, as AI translation tools are 
trained on large datasets. Translating from a language with moderate data resources, such as 
Croatian, into a language with extensive data resources, like English, poses less of a challenge 
for AI systems compared to translating in the reverse direction.
Finally, as anticipated, metaphorical expressions with close translational equivalents in the 
target language were translated more accurately than those without such close (lexical, 
semantic, and pragmatic) equivalents.
Considering the variability in translation quality – from highly accurate and contextually 
appropriate translations to completely inaccurate or unacceptable ones – the role of the 
teacher in this process becomes crucial. This issue is discussed in detail in the next section of 
this paper.
4 Discussion 
The data presented in Section 3 reveal interesting trends in the use of AI translation tools 
among Croatian students of English, with notable differences between first-year and third-
year students. These differences reflect both the development of various linguistic and 
pedagogical skills, as well as the evolving relationship students have with these tools as they 
progress through their studies. The findings emphasize a (potential) critical link between 
linguistic competence and reliance on IT (translation) tools. Additionally, they underscore 
the urgent and growing need for more structured guidance on the use of AI translation tools 
in the EFL classroom.
At this point, an important observation is that while the use of AI translation tools is 
widespread among students, this practice is not always mirrored in teaching practices or 
reflected in the national curriculum. In Croatia, there seems to be a disconnect between the 
rapid development of these tools and the pace at which educators are integrating them into 
the EFL classroom. One plausible explanation for this disconnect lies in the generational gap 
between students, who are more inclined to use technology in general, and teachers, who 
may be less familiar with or receptive to these tools. Interestingly, while everyday practice 
shows that teachers frequently use AI translation tools outside the classroom for their own 
purposes, our data suggest that they remain reluctant to incorporate these technologies into 
their teaching practices. This reluctance may stem from concerns about AI tools replacing 
traditional teaching methods or teachers feeling inadequately prepared to integrate them 
among their pedagogical tools (Edmett et al. 2023; Vogt and Flindt 2023).
Furthermore, this lack of integration of AI tools in language education is at least in part due to 
the lack of structured guidance or curricular support for AI integration in the EFL classroom. 
Given students’ natural inclination towards IT, and given that research has demonstrated 
197TRANSLATION STUDIES
the potential benefits of AI translation tools in language learning, including vocabulary 
improvement, stylistic refinement, and even anxiety control (Crompton, Edmett, and 
Ichaporia 2023), lack of structured guidance is undermining the many potential pedagogical 
benefits these tools could provide for both teachers and students.
In particular, deeper analysis and guidance is needed for clearly understanding and describing 
the tasks that AI tools can handle effectively, how these tasks can be incorporated into the 
classroom, and what exactly remains the role of the teacher and teacher expertise (for a 
comprehensive discussion regarding this point, see Edmett et al. 2023). In the context of the 
current topic, i.e. AI translation of metaphors, understanding how metaphor translation tasks 
can be incorporated into EFL teaching, and how students and teachers could benefit from 
this, would allow AI to serve as a useful pedagogical tool. In fact, while our study has shown 
that AI translation tools may often underperform with metaphorical expressions, teachers 
should be made aware of the fact that AI’s failures in accuracy should be seen as valuable 
teaching opportunities. In cases of inaccurate AI translation output (e.g., results too literal or 
contextually inappropriate), a detailed error correction process can serve not just to improve 
students’ language skills, but also to foster students’ cultural sensitivity and their ability to 
critically evaluate AI translation tools and their output from diverse perspectives.
The lack of curricular guidance on the use of IT (translation) tools results in their pedagogical 
integration in Croatia remaining largely informal and individualized. At the same time, as 
our data show, when guidance is provided, it is generally well received by students. In fact, 
while first-year students from our study primarily rely on Google Translate as a convenient 
tool, third-year students exhibit more varied use, with Glosbe emerging as the most popular 
choice. This shift is likely due to guidance they received in their German classes, highlighting 
how even a minor introduction to AI tools can significantly influence students’ perceptions 
and usage. The success of Glosbe among third-year students underscores the importance of 
structured guidance in AI tool integration, as it can influence how students navigate and 
evaluate the performance of different translation tools. Moreover, since AI tools develop at 
varying rates (some showing decline in performance results over time – cf. Gašparović, Brala-
Vukanović, and Brkić-Bakarić (in prep)), it is crucial that students are taught to critically 
assess and reassess these tools continuously.
Our data also shows an interesting evolution in students’ critical attitudes toward AI tools 
as they advance from the first to the third year, even without structured, formal pedagogical 
guidance in this respect. While first-year students often approach AI tools with caution, third-
year students show a more nuanced understanding of these tools’ limitations, especially in the 
translation of culturally specific content, such as metaphors.7 This progression seems to reflect 
their trial-and-error experiences with translation tools, and only sporadic, informal guidance 
from individual teachers. The third-year students, despite lacking formal pedagogical training, 
demonstrate a critical approach, acknowledging the inherent limitations of these tools and 
the necessity of double-checking translations. In this regard, integrating cultural awareness 
training into the use of AI translation tools, through error analysis and correction of Idioms 
7 For an interesting study on how students working in groups negotiate limitations in the use of Google Translate, see 
Rowe (2022).
198 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ...
and metaphors, for example, constitutes a step in the right direction. The current situation, 
where the development of critical thinking and critical evaluation skills regarding these tools 
is left to the individual trial-and-error experiences of students, and their intuitive awareness, 
is certainly not the most effective pedagogical method. 
Given all the above, it would be fair to say that AI translation tools can undoubtedly be useful 
for basic translation tasks, and students should be encouraged to use them as a complement to 
their own language skills, rather than a substitute. Instead of simply proposing IT translation 
tools as potential help, teachers should actively guide students in the practical application of 
these tools, crucially focusing on their potential to be used as excellent – and de-personalized 
– error-correction pedagogical opportunities. Notably, the de-personalized nature of AI’s 
inadequate or inaccurate translations eliminates the personal failure moments that typically 
occur in a real classroom, thus removing the potential for demotivating students or exposing 
them to negative and stressful emotions. This aspect is a crucial factor that should not be 
overlooked when analysing the future role of AI (translation) tools in the EFL classroom.
Providing opportunities for students to critically assess the quality of (a variety of ) translations, 
especially when dealing with idiomatic or metaphorical language, can help improve not only 
their linguistic and translation skills, but also their wider understanding of cultural and 
contextual nuances of language and communication. Error correction as a pedagogical tool 
has a valid ally in IT. While AI-driven tools can and do make mistakes, the human ability 
to critically analyse and correct these mistakes, as well as to learn from them, is an efficient 
way for deeply and productively engaging with language and the translation process in the 
context of AI tools.
5 Concluding Remarks 
In light of the discussion above, it is evident that the integration of AI translation tools 
into the EFL curriculum is essential and urgently needed. This integration would not only 
reflect students’ current habits, interests, and needs, but also allow for a more structured and 
effective use of these tools in language learning. Several key points need to be addressed in 
this regard: 1) the structured inclusion of AI tools in the curriculum and teacher training on 
their pedagogical use; 2) continuous monitoring of the development of AI translation tools; 
and 3) fostering critical thinking in students, particularly in relation to translating culturally 
specific language.
While AI translation tools offer both opportunities and challenges, their current lack 
of integration in the classroom limits their potential. By fostering critical thinking and 
promoting a balanced use of these tools, educators can enhance students’ translation 
abilities, especially when dealing with the complexities of culturally embedded language, 
such as metaphors. Even though AI translation tools often struggle with culture-specific 
expressions, these shortcomings provide excellent opportunities for error-correction-based 
pedagogy. As AI technology continues to evolve, it is crucial that educators – and before 
them education policy makers – adopt a proactive, structured approach to integrating these 
tools into the curriculum. The future of translation and language teaching is inevitably 
intertwined with AI, and it is crucial that both teachers and students are equipped to 
199TRANSLATION STUDIES
navigate this evolving landscape and use it in an informed and guided way, so as to make 
full use of their potential and avoid possible traps.8
Further research is needed to explore how AI translation tools can be more effectively 
integrated into language learning curricula. Given their rapid development, investigating 
just how exactly AI tools can be tailored to address specific challenges – such as 
translating metaphors and other culturally embedded language features – will be crucial. 
Understanding which pedagogical tasks are best suited for AI and which should remain the 
(sole) responsibility of the teacher will provide valuable insights into how educators can 
best prepare students for the challenges and opportunities AI tools present.
Ultimately, the gap between AI’s potential and its actual integration in EFL classrooms 
is not only a technological issue but also a pedagogical one. Future research and teacher 
training should address both the practical and theoretical aspects of AI tool use in language 
education. Training programs should not only focus on how to use these tools effectively 
but also emphasize fostering students’ critical engagement with them. Teaching students 
to understand the limitations of AI tools and using these limitations as a pedagogical 
opportunity will enhance their language skills and ensure that AI becomes a valuable, 
guided resource in the EFL classroom.
In conclusion, while many of the issues discussed remain at an intuitive level, their 
importance and relevance demand immediate scholarly attention and structured curricular 
guidance. The time to act is now – further delays in integrating AI translation tools into 
EFL teaching would mean missing a critical opportunity to significantly enhance language 
learning and teaching in the digital age.
References 
Boroditsky, Lera, and Alice Gaby. 2010. “Remembrances of times east: Absolute spatial representations of 
time in an Australian Aboriginal community.” Psychological Sciences 21 (11): 1635–39. https://doi.org 
/10.1177/0956797610386621.
Crompton, Helen, Adam Edmett, and Neenaz Ichaporia. 2023. Artificial Intelligence and English Language 
Teaching: A Systematic Literature Review. British Council.
Dizon, Gerald, and Jamal M. Gayed. 2021. “Examining the impact of grammarly on the quality of mobile 
L2 writing.” JALT CALL Journal 17 (2): 74–92. https://doi.org/10.29140/jaltcall.v17n2.336.
Edmett, Adam, Neenaz Ichaporia, Helen Crompton, and Ross Crichton. 2023. Artificial Intelligence and 
English Language Teaching: Preparing for the Future. British Council. https://www.teachingenglish.org 
.uk/sites/teacheng/files/2024-08/AI_and_ELT_Jul_2024.pdf.
Farrokhnia, Mohammad, Sima K. Banihashem, Omid Noroozi, and Allan Wals. 2023. “A SWOT analysis 
of ChatGPT: Implications for educational practice and research.” Innovations in Education and 
Teaching International 61 (3): 460–74. https://doi.org/10.1080/14703297.2023.2195846.
Gartner, Smiljana, and Marjan Krašna. 2023. “Artificial intelligence in education – ethical framework.” 
12th Mediterranean Conference on Embedded Computing (MECO), Budva, Montenegro, 1–7. https:// 
doi.org/10.1109/MECO58584.2023.10155012. 
8 Among the many challenges discussed in this study, one has remained outside its primary focus – namely, the ethical 
considerations. Although the present research does not explicitly address the ethical dimensions of incorporating AI 
technologies into the classroom, this exclusion should not be interpreted as suggesting that the issues explored here 
can be considered in isolation from the significant ethical concerns that accompany them. For a comprehensive and 
recent discussion of ethical considerations related to the integration of AI tools into pedagogical processes, see Gartner 
and Krašna (2023).
200 Marija Brala Vukanović Translating (Metaphors) in the Age of AI: Opportunities, Challenges, and Implications for ...
Gašparović, Marijana, Marija Brala-Vukanović, and Marija Brkić-Bakarić. in prep. “Idioms and machine 
translation: Assessing translation accuracy in the age of AI.” Paper submitted to CALICO Journal.
Healey, Justin. 2020. Artificial Intelligence. Volume 450. The Spinney Press.
Kazu, Ibrahim Yasar, and Murat Kuvvetli. 2023. “The influence of pronunciation education via artificial 
intelligence technology on vocabulary acquisition in learning English.” International Journal of 
Psychology and Education Studies 10 (2): 480–93. https://ijpes.com/index.php/ijpes/article/view/1044.
Khansir, Ali A., and Farhad Pakdel. 2018. “Place of error correction in English language teaching.” 
Educational Process: International Journal 7 (3): 189–99. https://doi.org/10.22521/edupij.2018.73.3.
Lakoff, George, and Mark Johnson. 1980. Metaphors We Live By. University of Chicago Press.
Maier, Martin, and Rasha Abdel Rahman. 2024. “Transient and long-term linguistic influences on visual 
perception: Shifting brain dynamics with memory consolidation.” Language Learning 74 (1): 157–
84. https://doi.org/10.1111/lang.12631.
Musk, Nicholas. 2022. “Using online translation tools in computer-assisted collaborative EFL writing.” 
Classroom Discourse 13 (2): 119–44. https://doi.org/10.1080/19463014.2021.2025119.
Naveen, Palanichamy, and Pavel Trojovský. 2024. “Overview and challenges of machine translation for 
contextually appropriate translations.” iScience 27 (10): 110878. https://doi.org/10.1016/j.isci.2024 
.110878.
Okolo, Chinwe Jane, Chinyere Grace Ezeonwumelu, Chioma Ihuoma Barah, and Ugwu Nnenna Jovita. 
2024. “Language education in the age of AI: Opportunities and challenges.” Newport International 
Journal of Research in Education 4 (1): 39–44. https://doi.org/10.59298/NIJRE/2024/41139448.
Rowe, Lindsey W. 2022. “Google Translate and biliterate composing: Second‐graders’ use of digital 
translation tools to support bilingual writing.” TESOL Quarterly 56 (3): 883–906.    https://doi.org 
/10.1002/tesq.3143.
Schmidt, Thomas, and Thomas Strasser. 2022. “Artificial intelligence in foreign language learning and 
teaching: A CALL for intelligent practice.” International Journal of English Studies 33 (1): 165–84.
Vogt, Kerstin A., and Lars Flindt. 2023. “Artificial intelligence and the future of language teacher 
education: A critical review of the use of AI tools in the foreign language classroom.” In The Future of 
Teacher Education: Innovations Across Pedagogies, Technologies, and Societies, edited by P. Hohaus and 
J.-F. Heeren, 179–99. Brill.
Whorf, Benjamin Lee. 1956. Language, Thought, and Reality: Selected Writings of Benjamin Lee Whorf. 
Edited by John B. Carroll. MIT Press.
201TRANSLATION STUDIES
APPENDIX – Questionnaire Administered to Students
1. Do you use IT/computer-assisted translation tools in your work?
2. If yes, which AI translation tool do you use most frequently?
3. Have you ever been exposed to AI translation tools in your English classes?
4. Have you received any guidance from your university lecturers regarding the use of 
AI translation tools?
5. In your view, what are the benefits of using AI translation tools?
6. In your view, what are the limitations of using AI translation tools?
7. Do you rely on IT tools for the translation of metaphors?
8. In your view, how efficient are IT tools when translating metaphors?
9. As future EFL teachers, what guidelines would you give your students regarding the 
use of AI translation tools?

203TRANSLATION STUDIES
Ghodrat Hassani, Marziyeh Malekshahi, 
Hossein Davari
Damghan University, Iran
2025, Vol. 22 (1), 203-221(228)
journals.uni-lj.si/elope
https://doi.org/10.4312/elope.22.1.203-221
UDC: [81’25:004.89]:659.1(55)
AI-Powered Transcreation in Global Marketing: 
Insights from Iran
ABSTRACT
This study examines AI-powered transcreation’s role in improving cross-cultural brand 
communication. We employed GPT-3 to evaluate AI’s ability to enhance global marketing 
through improved translation and adaptation of brand messages. Traditional translation 
methods often fail to capture brand-specific emotional resonance across cultures, but AI tools 
may address this challenge. Our research compared 10 translation students and 10 professional 
translators in translating/transcreating brand taglines from Persian to English. An initial test 
without AI showed professionals outperforming students. After six weeks of GPT-3 training, 
however, students surpassed professionals, as judged by expert raters using standardized 
criteria. The findings indicate that targeted AI training can improve transcreation quality. 
The study also underscores the value of human judgment in crafting prompts and choosing 
optimal AI outputs. These results also offer insights for translation education, professional 
training, and global marketing strategies. 
Keywords: copywriting, GPT-3, large language model (LLM), marketing translation, 
transcreation 
Transkreacija z umetno inteligenco v globalnem marketingu: 
spoznanja iz Irana
IZVLEČEK
Študija ugotavlja, kako lahko transkreacija (oz. preustvaritev), podprta z umetno inteligenco, 
izboljša medkulturno komuniciranje blagovne znamke. Z GPT-3 smo ovrednotili zmožnost UI, 
da s pomočjo izpopolnjenega prevoda in priredbo oglasnih sloganov okrepi trženje blagovne 
znamke. Klasični prevajalski pristopi pogosto ne zajamejo kulturno specifične čustvene note 
posamezne blagovne znamke, medtem ko se orodja UI s tem izzivom lahko spoprimejo. V 
raziskavi je 10 študentov in študentk prevajanja in 10 profesionalnih prevajalcev in prevajalk 
prevajalo/transkreativno prilagajalo perzijske oglasne slogane v angleščino. V začetnem preizkusu 
brez pomoči UI so bili profesionalni prevajalci in prevajalke uspešnejši, po šesttedenskem 
usposabljanju za delo z GPT-3 pa so študenti in študentke po oceni strokovne komisije, 
ki je upoštevala standardizirana merila, prehiteli profesionalce. Rezultati kažejo, da ciljno 
usposabljanje za delo z UI izboljša kakovost transkreacije. Študija kaže tudi na pomen človeške 
presoje pri oblikovanju napotkov in izbiri optimalnih odgovorov UI ter nudi tudi nove vpoglede 
za izobraževanje prevajalcev in prevajalk, strokovno usposabljanje in globalne tržne strategije.
Ključne besede: pisanje oglasnih besedil, GPT-3, velik jezikovni model (LLM), tržno 
prevajanje, transkreacija
204 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran
1 Introduction
Global brands face growing challenges in cross-cultural marketing communication, needing 
more than linguistic translation to ensure brand messages resonate across diverse markets 
while retaining their core identity. Transcreation, a specialized translation process, adapts 
marketing content to align with cultural contexts while preserving emotional impact 
(Díaz-Millón 2021, 159; Díaz-Millón and Olvera-Lobo 2021, 354). However, traditional 
translation methods often fail to convey brand-specific emotional resonance across cultures, 
posing a significant obstacle for global marketing. Skilled human translators can address this 
through careful cultural adaptation, but the process is time-consuming and inconsistent. This 
challenge has intensified with the rising demand for multilingual content across platforms, 
from technical documentation to social media (Nimdzi Insights 2022).
Large language models (LLMs) like GPT-3 offer a potential solution, with advanced 
capabilities in generating and adapting natural language across languages. Yet, their 
application to transcreation in marketing remains underexplored, especially regarding how 
AI tools might improve translator performance in cross-cultural brand communication. This 
study fills this gap by examining AI-powered transcreation’s role in enhancing cross-cultural 
brand messaging, focusing on translations from Persian to English for English-speaking 
North American audiences. We investigate whether GPT-3-assisted tools can improve the 
shift from mere translation to effective transcreation of marketing content. The research 
compares translation students and professional translators in translating/transcreating brand 
taglines from Persian to English, first without AI support and then after having provided 
students with targeted training in GPT-3-powered tools.
This study’s significance lies in its insights into how AI technologies could reshape translation 
workflows, particularly in marketing, where cultural nuance is critical. As brands aim to 
engage diverse global markets, effective AI-assisted transcreation methods could improve 
cross-cultural communication, potentially lowering costs and increasing efficiency.
2 Translation in an Automated Age
Advanced translation technologies, such as neural machine translation, have raised fears 
that human translators may become obsolete as automation disrupts the industry (Cronin 
2013). Recent scholarship, however, tempers this view. Moorkens (2020) points out ongoing 
limitations in machine translation, while Pielmeier and O’Mara (2020) highlight new 
hybrid roles where translators work alongside AI. Although the long-term impact is unclear, 
translators must adapt to remain relevant in a technology-driven field.
AI’s expanding role in areas like legal, medical, marketing, and technical translation makes 
resisting this technological shift increasingly impractical (Cronin 2013; Łukasik 2024). 
This change offers benefits, including improved productivity, new market opportunities, 
and specialized roles, but also presents challenges such as pricing pressures, shifting skill 
requirements, job security concerns, and the need for continuous training (Olohan 2017). 
As businesses aim to connect with global audiences, the rising demand for multilingual 
content – covering technical manuals, instructional documents, marketing materials, and 
G. Hassani, M. Malekshahi, 
H. Davari
205TRANSLATION STUDIES
social media – emphasizes the importance of culturally tailored communication for effective 
global branding (Way 2020). AI significantly contributes to this field, with the AI language 
translation market expected to reach USD 7.16 billion by 2029, growing at a 25% CAGR, 
by providing fast, cost-effective, and scalable translation solutions to meet these global needs 
(The Business Research Company 2025).
Traditional translation methods, often slow and resource-heavy, are being transformed by 
AI, particularly generative tools powered by LLMs. These systems speed up content drafting 
and adapt tone, style, and language to specific needs (Nimdzi Insights 2024). Leading LLMs, 
such as OpenAI’s GPT, Google’s Gemini, Meta’s Llama, xAI’s Grok, Cohere, Mistral, and 
Anthropic’s Claude, support over two dozen languages, enabling simultaneous content 
generation in multiple languages from a single input (Nimdzi Insights 2023). The integration 
of AI, particularly LLMs, streamlines multilingual content production, enabling businesses 
to communicate more efficiently with global audiences. In 2024, the Nimdzi 100 report 
highlighted that 67% of language service providers (LSPs) utilized generic, out-of-the-box 
AI solutions, such as ChatGPT, while 55% integrated LLMs via APIs into their workflows, 
significantly enhancing the speed and scalability of AI-generated translations (Nimdzi 
Insights 2025, 72).
The rise of global brands has increased demand for transcreation, which adapts branded 
messages to resonate emotionally across cultures (Torresi 2010). Unlike direct translation, 
transcreation requires expertise in international branding, cross-cultural psychology, and search 
engine optimization (Mitchell-Schuitevoerder 2020). This need is especially evident in digital 
spaces, where social media enables brands to engage international audiences directly. Effective 
localization shapes consumer perceptions and purchasing decisions, prompting translation 
firms to offer multilingual copywriting for digital platforms (Nimdzi Insights 2022).
Although many translators lack advanced marketing expertise, AI tools like GPT-3 offer a 
solution. By processing large datasets of marketing content, these models generate localized 
suggestions that align with target audiences’ emotional expectations. Human translators 
provide essential oversight, refining AI outputs, ensuring cultural accuracy, and enhancing 
textual precision. This collaborative approach improves marketing content quality while 
reducing the need for translators to have extensive marketing knowledge.
3 Transcreation for Marketing Translation 
Transcreation goes beyond traditional translation by creatively adapting marketing messages 
to resonate deeply across cultural boundaries while adhering to legal mandates, such as 
France’s Toubon Law requiring French in commercial communications. Unlike conventional 
translation, which seeks to preserve the original message, transcreation reinterprets its essence 
to suit the target audience’s linguistic and cultural context, maintaining its emotional impact 
(Díaz-Millón 2021, 159; Díaz-Millón and Olvera-Lobo 2021, 354). Katan (2016, 377) 
describes this as a “transcreational turn” in translation, emphasizing re-creation in fields like 
advertising and localization. Katan (2018) further distinguishes transcreation from translation, 
presenting translators as creators, especially in culturally sensitive contexts. Additionally, 
Katan and Taibi (2021) frame transcreation within cultural mediation, offering insights into 
206 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran
its theoretical and practical roles in translation. This perspective builds on Katan’s (2013) 
exploration of intercultural mediation, which underscores the translator’s role in bridging 
cultural gaps to facilitate effective communication in diverse settings.
This flexible process allows skilled translators to demonstrate creativity by reframing messages 
to engage local audiences effectively. Transcreation aims to produce content with equivalent 
impact and emotional connection in another language, even if the text diverges significantly 
from the original (Bowker 2023, 129). However, this assumption of equivalence raises 
questions: Can emotional impact be fully replicated across cultural and linguistic divides? 
Katan (2001) highlights the importance of intercultural competence in ensuring that such 
adaptations respect cultural differences without compromising the message’s intent, suggesting 
that effective transcreation requires a nuanced understanding of cultural dynamics. Metrics 
for assessing equivalence remain subjective and underexplored. Although practitioners may 
assert emotional parity, empirical studies on cross-cultural audience responses are scarce, 
challenging the notion of achieving identical emotional resonance in diverse contexts.
For instance, Nike’s global campaign in France did not directly translate its slogan Just Do It. 
Instead, it transcreated it as Fais-le (“Do it”), complying with France’s Toubon Law (Law 94-665 
of 1994). While retaining the original’s call to action, Fais-le is more concise and commanding, 
omitting the “just” qualifier. This creates a bolder, more urgent tone deemed better suited to 
French cultural preferences, enhancing its motivational impact. However, the nuance of “just,” 
which softens the encouragement in English, is lost, making Fais-le more directive.
Achieving this balance between creative adaptation and fidelity to the original message is 
critical, as overly free reinterpretation risks diluting the message’s essence. Determining 
appropriate boundaries requires careful human judgment. Systemic functional linguistics, 
particularly appraisal theory (Martin and White 2005), offers tools to analyse attitudinal 
language, intensity, and audience engagement. Recent studies using these frameworks 
reveal how transcreation reconstructs emotional resonance and persuasive meanings across 
languages (Ho 2024).
This balance between creative adaptation and fidelity to the original prompts investigation 
into whether LLMs like GPT-3 can assist translators in generating varied, culturally tailored 
marketing copy for further refinement and selection.
4 GPT-3: A Generative Transformer Model Within the 
Broader LLM Landscape
GPT-3 (Generative Pre-trained Transformer 3), developed by OpenAI in 2020, marks a 
significant advance in autoregressive generative language models, a subset of natural language 
processing (NLP) systems. This neural network produces human-like text across varied 
contexts, but it is only one approach within the diverse field of LLMs. Other architectures 
serve distinct purposes: encoder-only models like BERT excel in classification and named 
entity recognition, while encoder-decoder models like T5 handle both classification and 
generation. This study focuses on generative models like GPT-3 for their relevance to creative 
marketing translation tasks.
207TRANSLATION STUDIES
At its core, GPT-3 predicts the next token or sequence based on patterns in its training 
data, enabling coherent and contextually appropriate text generation (Brown et al. 2020). 
Unlike other LLM architectures, this predictive capability suits tasks requiring creative 
output. However, open-ended text generation increases the risk of hallucinations – plausible 
but incorrect outputs – compared to tasks like summarization or rewriting, where source 
content guides the model (Bender et al. 2021, 610–12). Discriminative models like BERT 
face different limitations, including their reliance on bidirectional context and unsuitability 
for generating creative text (Devlin et al. 2019, 4171–72).
GPT-3’s transformer architecture processes entire contexts simultaneously via self-attention 
mechanisms, capturing complex cultural patterns, linguistic nuances, and stylistic elements 
(Vaswani et al. 2017). This enables culturally sensitive translations that preserve meaning 
and emotional impact, critical for marketing transcreation. Unlike bidirectional encoder 
models like BERT or versatile encoder-decoder models like T5, GPT-3 uses only the decoder 
component in a unidirectional (left-to-right) approach, optimizing it for generation.
Pre-trained on vast datasets like Common Crawl, GPT-3 develops robust linguistic skills, 
including grammar, semantics, and world knowledge, by predicting next words across 
trillion-word corpora. However, its reliance on data without human-like reasoning can lead 
to biased, unsafe, or factually incorrect outputs, reflecting stereotypes or misinformation in 
its training data. Without true understanding, GPT-3 may produce harmful or misleading 
text, especially in high-stakes contexts, necessitating careful human oversight to mitigate 
ethical, legal, or safety risks (Tamkin et al. 2023, 4–6).
Through fine-tuning or few-shot learning, GPT-3 adapts to new topics or styles with 
minimal examples, making it ideal for marketing content adaptation. This contrasts with 
models like BERT, which require explicit fine-tuning per task. OpenAI’s generative engines, 
such as Davinci, Babbage, and Ada, enhance GPT-3’s capabilities for specific applications 
(Tingiris 2021, 53). Integration into translation tools, like Matecat with GPT-4 for 
contextual explanations, highlights their creative potential. Skilled users must guide these 
tools, balancing their strengths and limitations, particularly for transcreation, where cultural 
nuance is essential.
5 Research Context and Methodology
As language technologies advance rapidly, they promise to transform text production and 
translation practices. In response, Iran’s undergraduate translation program revised its 
curriculum in 2017 to focus on emerging technologies, preparing students for a technology-
driven industry. The updated curriculum replaced outdated courses with subjects like 
Translation and Technology, Translation Market, and Emerging Trends in Translation. 
These courses aim to build skills in using modern tools, understanding technology’s impact 
on the field, and mastering high-demand areas such as social media, website localization, 
transcreation, and copywriting.
This study investigates whether GPT-3 applications can enhance translation students’ 
transcreation skills within Iran’s technology-oriented curriculum. The revised national 
208 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran
curriculum (Ministry of Science, Research and Technology 2022) at the BA level emphasizes 
localization, marketing translation, and the integration of advanced technologies in 
coursework, aligning with global trends in translation education that increasingly adopt AI 
tools (Mellinger 2019; Rodríguez-Castro 2018).
While research on AI in translation education exists, studies on LLM applications for 
transcreation are limited. Daems and Macken (2019) reported improved outcomes from 
incorporating neural machine translation into training, and Kenny and Doherty (2014) 
provided frameworks for technology-enhanced pedagogy, which we adapted for our GPT-
3 approach. Our study compares the quality of student translations using GPT-3 as a 
supplementary tool against those by professional translators without AI support, assessing 
whether such technologies can help students produce transcreations comparable to those 
from professionals.
5.1 Participants 
Ten students in their final year of BA translation studies were chosen from 27 enrolled in 
the Emerging Trends in Translation course at Damghan University, Iran. These students 
had completed prerequisite courses in Translation and Technology and Translation Market, 
covering AI in translation, audience analysis for digital marketing, and the role of content 
creation in modern translation.
At the study’s start, advanced chatbot interfaces like ChatGPT (powered by GPT-3.5) or 
later LLM-based tools (e.g., GPT-4) were not widely available. Integrating such tools into 
coursework was impractical, owing to limited access, fixed curricula, and ongoing classes. 
Switching tools mid-course could disrupt learning. Thus, we opted for specialized GPT-3-
based applications tailored to our pedagogical goals for a consistent student experience.
Students qualified through a screening exam testing proficiency in prompt1 engineering, 
output refinement, and translating marketing texts using tools like CopyAI and Yaara. Only 
10 of the 27 students achieved the required 80% score. Lack of prior GPT-3 experience was a 
mandatory criterion to ensure a baseline comparable to that of the professional group.
The second group included 10 professional freelance translators. Initially, we sought specialists 
in transcreation and marketing translation, defined as professionals with at least 50% of their 
workload in these areas, formal marketing communications training, and 25+ transcreation 
projects for international brands. Because of recruitment challenges, we broadened the 
criteria to include translators with at least 5 years of full-time experience on diverse projects, 
including marketing materials. Screening confirmed all had completed at least 10 marketing 
translation projects, though such work comprised less than 30% of their portfolios.
Professionals were recruited from buyers of Yaademy’s computer-assisted translation (CAT) 
tool video tutorials, where the lead researcher is a technical consultant and curriculum 
developer. Tutorial costs were refunded to encourage participation. All professionals reported 
1 A prompt refers to the initial text input given to an AI language model that serves as context or instructions for the 
model to generate a relevant response.
209TRANSLATION STUDIES
no prior experience with GPT-3 or related AI tools, a prerequisite for inclusion. Their highest 
qualifications were bachelor’s or master’s degrees in translation studies or related fields.
The sample size of 10 students and 10 professionals is a limitation. A larger sample would 
improve statistical power and generalizability. Constraints included the intensive GPT-3 
training, detailed qualitative evaluations, and difficulty recruiting experienced professionals 
willing to participate. Future studies should use larger samples to confirm these findings.
5.2 Procedure 
The study employed a mixed experimental design with a pre-test and post-test translation task, 
conducted under timed conditions by both the student group and the professional translators. 
In the pre-test, all participants, native Persian speakers, independently translated three brand 
motto taglines from Persian to English without GPT-3 access. The study ran from September 
to November 2022, with a 6-week training and experimentation period from mid-September 
to late October. Translations targeted English-speaking North American markets, primarily 
the United States, as specified in the translation brief. This focus reflects the ambition of 
Iranian companies to expand into Western markets, particularly the U.S., a key branding 
destination, despite political challenges. The selected companies were DottleBox (an ashtray 
producer), RareRead (a bookstore), and SharpPoint (a fishing equipment manufacturer).
Although the BA students had coursework in content creation and copywriting, neither 
group had prior GPT-3 training, making the pre-test a baseline of unaided translation ability. 
Following the pre-test, the student group received 6 weeks of intensive training on GPT-3 
tools, including CopyAI, Texta.ai, and Yaara, through two 90-minute weekly sessions led 
by the researchers. While the researchers lacked formal AI content creation training, one 
did specialize in technology and translation, staying informed on industry trends through 
seminars and webinars. Training focused on prompt engineering for marketing text and best-
practice guides, with exercises generating outputs like product descriptions, meta descriptions, 
mottos, and captions.
5.3 Client Specifications
The translation brief provided to participants included specific client requirements to ensure 
the transcreated mottos aligned with marketing goals for North American audiences. Clients 
specified that mottos should be concise, using fewer than seven words to enhance memorability 
and recall, critical for effective brand communication. Additionally, mottos were required to 
reflect the brand’s core identity, be easily understandable, and resonate emotionally with U.S. 
consumers. These criteria – conciseness, branding representativeness, comprehensibility, and 
memorability – formed the basis for the evaluation rubric used by expert raters, ensuring 
translations met the clients’ expectations for culturally tailored, impactful marketing content.
5.4  Experiment Design
Of the 27 initial students, only 10 demonstrated sufficient post-training proficiency, 
assessed via a practical evaluation requiring tailored marketing content creation across three 
hypothetical brand scenarios using GPT-3 tools. Researcher-designed rubrics evaluated 
210 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran
prompt engineering, output refinement, and localized translation quality. Students needed an 
80% score to qualify for the treatment group, ensuring only those with strong skills influenced 
post-test outcomes. The professional group received no GPT-3 exposure or training, allowing 
a direct comparison of specialized AI training’s impact on student performance against 
professional standards. This design isolated the effect of GPT-3 proficiency on translation 
quality relative to established expertise.
Six weeks after the pre-test, qualified participants completed a post-test translation task designed 
as a simulated client request. They received background summaries for three companies 
aiming to adapt Persian motto taglines into memorable English slogans for North American 
consumers, particularly in the U.S. The companies were Match-Lite, a match producer since 
1917, modernizing its motto 
Of the 27 initial students, only 10 demonstrated sufficient post-training proficiency, assessed via a 
practical evaluation requiring tailored marketing content creation across three hypothetical brand 
scenarios using GPT-3 tools. Researcher-designed rubrics evaluated prompt engineering, output 
refinement, and localized translation quality. Students needed an 80% score to qualify for the 
treatment group, ensuring only those with strong skills influenced post-test outcomes. The 
professional group received no GPT-3 exposure or training, allowing a direct comparison of 
specialized AI training’s impact on student performance against professional standards. This design 
isolated the effect of GPT-3 proficiency on translation quality relative to established expertise. 
Six weeks after the pre-test, qualified participants completed a post-test translation task designed 
as a simulated client request. They rec ived background summaries for three companies aiming to 
adapt Persian motto taglines into memorable English slogans for North American consumers, 
particularly in the U.S. The companies were Match-Lite, a match producer since 1917, modernizing 
its motto “ کنید انتخاب  را   VidEdu, a youth-focused e-learning ;(Always choose the best) ”بھترین 
platform with the motto “ساده و صمیمی  and GentleTools, a ;(Easy and friendly learning) ”یادگیری 
toolbox manufacturer emphasizing durability with the motto “ خریدآخرین ابزاری کھ می ” (The last tools 
you will buy). Briefs included textual brand histories, style guidelines, and visual marketing 
materials like ads showing product use. (Company names are pseudonyms for privacy.) Participants 
were instructed to create culturally resonant mottos tailored to U.S. consumer preferences. 
Students worked independently in a monitored computer lab, using personal resources like online 
dictionaries, translation memory databases, brand research, style guides, and glossaries. 
Professionals, geographically dispersed up to 1000 miles apart, completed the task remotely within 
the same timeframe. To ensure consistency, their activities were tracked via screen recording and 
timed submission protocols, despite differing settings. 
A panel of five university translation professors with doctorates and expertise in marketing 
translation evaluated the translations. With over 60 years of combined experience in cross-cultural 
psychology, international marketing, copywriting, and advertising, the raters used a standardized 
rubric assessing four metrics – conciseness, branding representativeness, comprehensibility, and 
memorability – each scored up to 5 points for a 20-point total, aligning with Iran’s tertiary 
education standards. A pre-scoring norming session ensured consistent application of criteria 
through exemplar discussions. 
To prevent bias, raters were blinded to translator group and GPT-3 use, with mottos presented in 
random order and identified by number. Inter-rater agreement, measured by Fleiss’ kappa, was 
0.79, indicating strong consensus per Landis and Koch (1977). Some disagreement arose over 
conciseness, reflecting the subjective nature of translation quality assessment, even with 
structured rubrics (Bayer-Hohenwarter 2011; Hassani 2011; Doherty 2017). Raters also provided 
qualitative feedback on motto strengths and weaknesses, complementing numeric scores and 
enriching insights. 
The evaluat on’s igor – standardized rubr c , expert raters, multi-method assessment, and inter-
rater reliability – bolstered the validity and reliability of the translation quality assessment. 
 (Always choose the best); VidEdu, 
a youth-focused e-learning platform with the motto 
Of the 27 initial students, only 10 demonstrated sufficient post-training proficiency, assessed via a 
practical evaluation requiring tailored marketing content creation across three hypothetical brand 
scenarios using GPT-3 tools. Researcher-designed rubrics evaluated prompt engineering, output 
refinement, and localized translation quality. Students needed an 80% score to qualify for the 
treatment group, ensuring only those with strong skills influenced post-test outcomes. The 
professional group received no GPT-3 exposure or training, allowing a direct comparison of 
specialized AI training’s impact on stud nt performance against professional standards. This design 
isolated the effect of GPT-3 proficiency on translation quality relative to established expertise. 
Six w eks aft r the pre-test, qualified participants completed a post-test translation task designed 
as a simulated client request. They received background summaries for three companies aiming to 
adapt Persian motto t glines into m m abl  English slogans for North American consumers, 
particularly in the U.S. The companies were Match-Lite, a match producer since 1917, modernizing 
its motto “ کنید انتخاب  را   VidEdu, a youth-focused e-learning ;(Always choose the best) ”بھترین 
platform with t e motto “ساده و صمیمی  and GentleTools, a ;(Easy and friendly learning) ”یادگیری 
toolbox manufacturer emphasizing durability with the motto “ خریدآخرین ابزاری کھ می ” (The last tools 
you will buy). Briefs included textual brand histories, style guidelines, and visual marketing 
materials like ads showing product use. (Company names are pseudonyms for privacy.) Participants 
were instructed to create culturally resonant mottos tailored to U.S. consumer preferences. 
Students worked independently in a monitored computer lab, using personal resources like online 
dictionaries, translation memory databases, brand research, style guides, and glossaries. 
Professionals, geographically dispersed up to 1000 miles apart, completed the task remotely within 
the same timeframe. To ensure consistency, their activities were tracked via screen recording and 
timed submission protocols, despite differing settings. 
A panel of five university translation professors with doctorates and expertise in marketing 
translation evaluated the translations. With over 60 years of combined experience in cross-cultural 
psychology, international marketing, copywriting, and advertising, the raters used a standardized 
rubric assessing four metrics – conciseness, branding representativeness, comprehensibility, and 
memorability – each scored up to 5 points for a 20-point total, aligning with Iran’s tertiary 
education standards. A pre-scoring norming session ensured consistent application of criteria 
through exemplar discussions. 
To prevent bias, raters were blinded to translator group and GPT-3 use, with mottos presented in 
random order and identified by number. Inter-rater agreement, measured by Fleiss’ kappa, was 
0.79, indicating strong consensus per Landis and Koch (1977). Some disagreement arose over 
conciseness, reflecting the subjective nature of translation quality assessment, even with 
structured rubrics (Bayer-Hohenwarter 2011; Hassani 2011; Doherty 2017). Raters also provided 
qualitative feedback on motto strengths and weaknesses, complementing numeric scores and 
enriching insights. 
The evaluation’s rigor – standardized rubrics, expert raters, multi-method assessment, and inter-
rater reliability – bolstered the validity and reliability of the translation quality assessment. 
 (Easy and 
friendly learning); and GentleTools, a toolbox manufacturer emphasizing durability with the 
motto 
Of the 27 initial students, only 10 demonstrated sufficient post-training proficiency, assessed via a 
practical evaluation requiring tailored marketing content creation across three hypothetical brand 
scenarios using GPT-3 tools. Researcher-designed rubrics evaluated prompt engineering, output 
refinement, and localized translation quality. Students needed an 80% score to qualify for the 
treatment group, ensuring only those with strong skills i flu nced post-test outcomes. The 
professional group received no GPT-3 exposure or training, allowing a direct comparison of 
specialized AI training’s impact on student performance against professional standards. This design 
isolated the effect of GPT-3 proficiency on translation quality relative to established expertise. 
Six weeks after the pre-test, qualified participants completed a post-test translation task designed 
as a simulated client request. They received background summaries for three companies aiming to 
adapt Persian motto taglines into memorable English slogans for North American consumers, 
particularly in the U.S. The companies were Match-Lite, a match oducer since 1917, modernizing 
its motto “ کنید انتخاب  را   VidEdu, a youth-focused e-learning ;(Always choose the best) ”بھترین 
platform with the motto “ساده و صمیمی  and GentleTools, a ;(Easy and friendly learning) ”یادگیری 
toolbox manufacturer emphasizing durability with the mott  “ خریدآخرین ابزاری کھ می ” (The last tools 
you will buy). Briefs included textual brand histories, style guidelines, and visual marketing 
materials like ads showing product use. (Company names are pseudonyms for privacy.) Participants 
were instructed to create culturally resonant mottos tailored to U.S. consumer preferences. 
Students worked independently in a monitored computer lab, using personal resources like online 
dictionaries, translation memory databases, brand research, style guides, and glossaries. 
Professionals, geographically dispersed up to 1000 miles apart, completed the task remotely within 
the same timeframe. To ensure consistency, their activities were tracked via screen recording and 
timed submission protocols, despite differing settings. 
A panel of five university translation professors with doctorates and expertise in marketing 
translation evaluated the translations. With over 60 years of combined experience in cross-cultural 
psychology, international marketing, copywriting, and advertising, the raters used a standardized 
rubric assessing four metrics – conciseness, branding representativeness, comprehensibility, and 
memorability – each scored up to 5 points for a 20-point total, aligning with Iran’s tertiary 
education standards. A pre-scoring norming session ensured consistent application of criteria 
through exemplar discussions. 
To prevent bias, raters were blinded to translator group and GPT-3 use, with mottos presented in 
random order and identified by number. Inter-rater agreement, measured by Fleiss’ kappa, was 
0.79, indicating strong consensus per Landis and Koch (1977). Some disagreement arose over 
conciseness, reflecting the subjective nature of translation quality assessment, even with 
structured rubrics (Bayer-Hohenwarter 2011; Hassani 2011; Doherty 2017). Raters also provided 
qualitative feedback on motto strengths and weaknesses, complementing numeric scores and 
enriching insights. 
The evaluation’s rigor – standardized rubrics, expert raters, multi-method assessment, and inter-
rater reliability – bolstered the validity and reliability of the translation quality assessment. 
 (The last tools you will buy). Briefs included textual 
brand histories, style guidelines, and visual marketing materials like ads showing product 
use. (Company na es are pseudonyms for privacy.) Participants were instructed to create 
culturally resonant ottos tailored to U.S. consumer preferences.
Students worked independently in a monitored computer lab, using personal resources like 
online dictio aries, translation memory databases, bra d research, style guides, and glossaries. 
Professionals, geographically dispersed up to 1000 miles apart, completed the task remotely 
within the same timeframe. To ensure consistency, their activities were tracked via screen 
recording and timed submission protocols, despite differing settings.
A panel of five university translation professors with doctorates and expertise in marketing 
translation evaluated the translations. With over 60 years of combined experience in cross-
cultural psychology, international marketing, copywriting, and advertising, the raters used 
a standardized rubric assessing four metrics – conciseness, branding representativeness, 
comprehensibility, and memorability – each scored up to 5 points for a 20-point total, 
aligning with Iran’s tertiary educatio  standards. A pre-scoring norming session ensured 
consistent application of criteria through exe plar discussions.
To prevent bias, raters were blinded to translator group and GPT-3 use, with mottos presented 
in random order and identified by number. Inter-rater agreeme t, measured by Fleiss’ kappa, 
was 0.79, indicating strong consensus per Landis and Koch (1977). Some disagreement arose 
over conciseness, reflecting the subjective nature of translation quality assessment, even with 
structured rubrics (Bayer-Hohenwarter 2011; Hassani 2011; Doherty 2017). Raters also 
provided qualitative feedback on motto strengths and weaknesses, complementing numeric 
scores and enriching insights.
The evaluation’s rigor – standardized rubrics, expert raters, multi-method assessment, and 
inter-rater reliability – bolstered the validity and reliability of the translation quality assessment. 
Additional details on softwa e tracking, remote m nitoring, experimental conditions, and 
GPT-3 tool selection, while implemented, are omitted as these are not essential to the core 
methodology.
211TRANSLATION STUDIES
6 Findings and Discussion 
While the pre-test results offer meaningful insights, our focus in presenting the findings 
will be centred primarily on the results of the post-test translation task. This allows us to 
concentrate on the main interest of this study – assessing the impact of GPT-3 tools on 
marketing translation quality after specialized training. Additionally, in the interest of brevity, 
only salient details of the post-test most vital to conveying the key quantitative and qualitative 
findings will be highlighted. Further granular specifics must be omitted.
Table 1. Total Scores for Student and Professional Groups on the Pre-test and Post-test (Total: 20).
Student Pre-test Student Post-test Professional Pre-test Professional Post-test
DottleBox 6.2 MatchLite 12.1 DottleBox 11.4 MatchLite 8.2
RareRead 8.5 VidEdu 14.2 RareRead 11.9 VidEdu 12.3
SharpPoint 7.4 GentleTools 13.9 SharpPoint 9.8 GentleTools 14.9
Mean 7.36 Mean 13.4 Mean Score 11.03 Mean 11.8
To situate the post-test results, the pre-test performances establish an informative baseline. 
As shown in Table 1, the professionals initially outperformed the students by nearly 4 points 
before GPT-3 training, with average scores of 11.03 and 7.36 respectively. For the DottleBox 
brand of ashtrays, professionals scored 11.4 points versus 6.2 for the students. Similarly, for 
RareRead, a bookstore specializing in rare books, professionals received 11.9 points, while 
students managed 8.5. This decisive professional advantage persisted across SharpPoint, a 
fishing hook manufacturer, as well, leading students by 2.4 points. Professionals seem to have 
leveraged extensive real-world experience to demonstrate superiority in all metrics during the 
unaided pre-test translation. During the evaluation process, all translations were randomized 
and identified only by number, with raters blinded to translator group membership and GPT-
3 usage to prevent potential rating bias based on expected group differences. However, the 
narrative shifted dramatically in the post-test after students received specialized training in GPT-
3-powered applications. They gained a striking 6.04-point boost over their pre-test performance, 
while professionals improved by a marginal 0.5 points. Ultimately, students secured a 1.6-point 
post-test advantage over the professionals, exhibiting marked gains across most metrics.
Delving deeper into the post-test results yields additional insight. Students substantially 
outperformed professionals in all metrics when translating the MatchLite motto. However, 
for VidEdu, this margin narrowed, and for GentleTools, professionals exceeded students in 
certain metrics like conciseness. Conciseness in this context refers to the ability to convey the 
brand message using a minimum of words while maintaining impact: specifically, mottos 
using fewer than seven words scored higher. Still, when compiled in aggregate across all 
companies, students secured decisive leads across every metric, with the widest gap observed 
in conciseness. While variances emerged across brands, preventing simplistic interpretations, 
the aggregated data suggests that GPT-3 tools enabled students to produce higher-scoring 
motto translations overall. 
The abbreviations S/P, C, R, U, M, and OA represent the student translator or professional 
translator, the average scores for conciseness, representability, comprehensibility (U standing 
212 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran
for the synonymic expression “understandability”), memorability, and the overall average 
score, respectively.
In the case of Match-Lite, as presented in Table 2, the top three mottos as rated by all five 
evaluators were: A match from hell itself! (overall score 20), Let there be light (18), and Alight 
from the WWI (15).
A match from hell itself! scored highly for its cheekiness, wit, and modernity, which 
contributed to its high memorability. Let there be light received full points from four raters 
Table 2. Mottos & Scores: Match-Lite.1 
No. Motto S/P C R U M OA
1 A match from hell itself! S 5 5 5 5 20
2 Let there be light. S 5 5 3 5 18
3 Alight from the WWI2 S 5 4 3.5 2.5 15
4 A match as cool as its name S 2 4 4 4 14
5 A lifetime at your fingertips S 5 2 3 4 14
6 A flame to light your life S 5 3 3 2 13
7 A company rich in history but young 
at heart
P 1 2 4 4 11
8 Match-Lite, A Matchless Match P 4 1 2 3 10
9 Goodnight sweetheart P 4 0 2 4 10
10 Life is short, match a smile P 3 1 3 3 10
11 Match-Lite: Tradition & Innovation S 4 1 3 2 10
12 From the Ashes to the Top S 3 3 1 2 9
13 Match-Lite, your number one choice P 3 0 4 1 9
14 Where fire and flame are our business P 0 3 3 2 8
15 Match-Lite strikes anywhere you want P 2 2 2 1 7
16 Enjoy the relaxation of a striking flame P 0 3 2 1 6
17 It’s not just a light, it’s an experience. P 0 2 3 1 6
18 Match-Lite is where Safe Protection 
Meets Beauty
P 1 1 2 1 5
19 Match-Lite, always for your biggest 
adventures
S 0 0 2 2 4
20 The resource for professional fireworks 
displays
S 0 1 2 1 4
Total
Students N/A 34 28 29.5 29.5 121
Professionals N/A 19 15 27 21 82
2 The phrase “Alight from the WWI” contains grammatical errors (inappropriate preposition “from” and unnecessary 
definite article “the”). This AI-generated motto was included in our survey in its original form to maintain data 
integrity. The errors highlight the need for human oversight in AI responses.
213TRANSLATION STUDIES
for conciseness, representativeness, and memorability, although one deducted points for 
limited comprehensibility among non-religious people. The biblical reference resonated as a 
familiar, relatable story. Alight from the WWI was praised for conciseness, representability, and 
comprehensibility, but scored lower in memorability because of its lack of rhythmic, catchy 
phrasing. The historical reference to Match-Lite’s 1917 founding was appreciated. 
Two other top mottos scored 14 points. A match as cool as its name lost points for conciseness, 
at seven words. The raters were struck by the contradiction of a “cool” yet burning match. The 
other 14-point motto, A lifetime at your fingertips, was praised for evoking Match-Lite’s heritage 
and implying availability, but some found the phrasing vague. However, we acknowledge 
that assessment of marketing text quality inherently involves subjective elements. Despite 
Table 3. Mottos & Scores: VidEdu.
No. Motto S/P C R U M OA
1 Learn. Earn S 5 5 5 5 20
2 Learn$$$ S 5 5 5 4 19
3 Education Reimagined P 5 5 4 4 18
4 A “Guru” at your fingertips P 4 4 5 4 17
5 Ready for your next AHA moment? P 4 4 4 5 17
6 Your gateway to knowledge S 5 3 4 4 16
7 From zero to hero S 5 3 4 4 16
8 Imagine. Learn. Succeed. S 5 4 4 3 16
9 Learn anytime, anywhere. S 5 4 4 3 16
10 Get smarter faster S 5 3 4 3 15
11 Learn today. Lead tomorrow. P 4 4 3 3 14
12 Be pro. Be seen. Be noticed. P 3 2 4 4 13
13 VidEdu – Learning is easy. P 4 3 4 2 13
14 Education Powered by VidEdu P 4 3 4 2 13
15 Wanna learn? S 5 2 3 3 13
16 VidEdu, one-stop shop for online 
education
P 1 3 3 3 10
17 Cutting through the clutter of the 
Internet
S 0 1 4 3 8
18 You’ve come to the right place. P 0 0 3 1 4
19 Your source for quick, easy and 
AFFORDABLE video tutorials
P 0 2 2 0 4
20 Enjoy the ride on your journey 
with VidEdu.
S 0 2 1 0 3
Total
Students N/A 40 32 38 32 142
Professionals N/A 29 30 36 28 123
214 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran
rigorous rubrics and norming sessions, individual rater preferences, cultural backgrounds, 
and personal interpretations of effective marketing language inevitably influenced scoring 
decisions. 
Analysis revealed inconsistencies in conciseness ratings for motto translations. For instance, 
translation #7 (9 words) scored an average conciseness of 1, while translation #20 (6 words, 
meeting client specifications) scored 0. Despite high inter-rater agreement, raters’ divergent 
views on conciseness – shaped by individual backgrounds, cultural perspectives, or preferences 
– caused discrepancies. Some prioritized word count, while others valued impactful longer 
phrases. To improve consistency, future studies should define conciseness clearly, e.g., a seven-
word limit or balancing brevity with emotional resonance. Rater training is crucial to align 
scoring criteria and ensure reliable evaluations (Han 2020, 267–68; Doherty 2017, 142). 
Examining the total points awarded for each parameter by translator group reveals two key 
takeaways. First, while comprehensibility and memorability scores differ by 6 points for 
professionals, students scored identically in both. Whether this results from using GPT-3 
applications is debatable. Second, the most significant gap between groups is in conciseness 
(15 points), while comprehensibility differs little (2.5 points). So, while GPT-3 appears to 
substantially improve conciseness, comprehensibility gains seem marginal.
When looking at the top VidEdu mottos, the student-generated Learn. Earn. scored the 
highest with 20 points. Close behind in second was Learn$$$$ with 19 points, docked 1 
point for recognition value despite its aesthetic appeal. For the professional mottos, the top 
three contenders were Education Reimagined (18 points), A “Guru” at your fingertips (17 
points), and Ready for your next AHA moment? (also 17 points). Students rounded out places 
6–10 before the professional and student mottos began intermingling in the rankings below 
the top 10 (Table 3).
Examining the total points given for each metric reveals two notable findings. First, 
both student and professional groups exhibited markedly higher average scores across all 
assessment criteria relative to the Match-Lite motto translations. Students demonstrated the 
most pronounced score increase in comprehensibility (up 8.5 points), while Professionals 
saw the greatest score growth in branding representativeness (up 15 points). Second, whereas 
students had outperformed professionals by a sizeable 39-point margin in Match-Lite, this 
score differential shrank dramatically to just 19 points for the VidEdu motto. The reasons 
behind the professionals’ stronger VidEdu performance are largely ambiguous but potentially 
attributable to their comparatively greater real-world experience with adapting messaging for 
the education sector.
Meanwhile, students’ disproportionately elevated scores may partially stem from the 
disproportionately abundant textual data on education versus matches in GPT-3’s training 
corpus. The vastly larger volume of education-related material likely enabled GPT-3 to generate 
more context-appropriate suggestions tailored to an education-focused brand like VidEdu.
For GentleTools, two mottos received full marks: Blessed are the Gentle and Boring Done 
Fun! Evaluators praised the former for alluding to the biblical verse on the meek inheriting 
215TRANSLATION STUDIES
the earth. While the bible quotation Let there be light ranked second for Match-Lite, having 
lost 2 comprehensibility points, the raters felt the GentleTools verse would resonate more 
universally. Interestingly, the student who proposed these winning mottos ranked only 
14th in the pre-test phase. However, after gaining access to GPT-3 tools, she seems to have 
experienced a boost in creativity. She cleverly utilized CopyAI’s “more like this” feature to 
generate the biblical motto for GentleTools, modelling it after her own Match-Lite entry that 
had ranked second. Evaluators ultimately rated her GentleTools motto as the top suggestion, 
while her Match-Lite entry took second place. Once again, this example highlights the 
fact that interpreting qualitative branding metrics inevitably allows some rater discretion. 
As Bayer-Hohenwarter (2011, 97) explains, “the subjective has to be acknowledged as an 
inevitable ingredient in any TQA recipe.” 
Raters also appreciated the play on words in Boring Done Fun! around tools for boring and 
boring as the opposite of fun. The next mottos, You break it. We fix it (student) and When the 
Table 4. Mottos & Scores: GentleTools.
No. Motto S/P C R U M OA
1 Blessed are the Gentle S 5 5 5 5 20
2 Boring Done Fun! S 5 5 5 5 20
3 You break it. We fix it. S 4 5 5 5 19
4 When the tough get going! P 4 4 5 5 18
5 Riding the wave of industrialization P 4 5 4 4 17
6 The dream of a craftsman P 4 4 4 5 17
7 GentleTools: A Solid Choice P 5 3 4 5 17
8 WORK SMARTER NOT HARDER P 5 4 4 4 17
9 Gentle as a butterfly, stinging as a bee. P 1 5 5 5 16
10 Quality in, Quality out. S 5 2 5 3 15
11 The right tool for the right job S 2 5 5 3 15
12 Tough tool for a tough job S 3 4 4 4 15
13 Exceptional design. Exceptional durability. P 4 2 4 3 13
14 Inspiration for innovation S 4 2 4 3 13
15 GentleTools: A name you can trust. S 2 1 5 4 12
16 Built to last a lifetime P 4 2 3 3 12
17 GentleTools: where power meets integrity P 3 3 3 2 11
18 Unrivalled Function. Unbeatable Value P 5 1 3 2 11
19 Don’t let their small size fool you. S 0 2 2 4 8
20 Gentle Like a Woman. Tough Like a Man. S 0 1 1 0 2
Total
Students N/A 30 32 41 36 139
Professionals N/A 39 33 39 38 149
216 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran
tough get going! (professional), scored closely behind. Professionals collected the next 5 spots, 
#5–9. Table 4 breaks down the GentleTools motto ratings.
Two evaluators made an astute observation regarding motto quality: the various motto 
parameters must work cohesively rather than be assessed fully in isolation for maximal 
impact. In other words, a motto may score highly in the individual metrics of conciseness, 
representativeness, comprehensibility, and memorability, yet still fail to be a compelling 
motto when those components do not synthesize into a cohesive and impactful whole. 
Simply excelling at each separate criterion does not guarantee that the final assembled motto 
will resonate powerfully with audiences. Evaluators emphasized that a transcreated motto’s 
success hinges on achieving a harmonious gestalt where the distinct elements coalesce into a 
seamless brand statement that lands persuasively.
For example, Gentle Like a Woman, Tough Like a Man may be memorable, and 
comprehensible, and may represent the brand reasonably well, but it perpetuates harmful 
gender stereotypes that could alienate customers, warranting a low overall score. In contrast, 
the similarly structured motto Gentle as a butterfly, stinging as a bee resonated more fittingly 
with GentleTools’ desired branding of admirable yet non-toxic masculinity. It directly 
references the iconic motto Float like a butterfly, sting like a bee coined by legendary boxer 
Muhammad Ali to describe his graceful yet hard-hitting fighting style. Evaluators felt this 
motto expertly conveyed GentleTools’ branding aim to celebrate Ali’s principled model of 
conviction and resilience in line with themes of durability. By promoting Ali’s strength of 
character and principles, the adapted motto aligned admirably with the company’s desired 
values. It received high ratings of 5 in three metrics and lost points only in conciseness. This 
culminated in an apt overall score of 16 points.
This example illustrates how a motto’s overall impression can exceed the sum of its individual 
trait ratings, a phenomenon noted in complexity theory (Blumczynski and Hassani 2019; 
Marais 2021; Marais and Meylaerts 2022). However, this observation arose unexpectedly 
since the raters had been expressly instructed to appraise isolated qualities, not holistic value. 
Including both detailed dimensional ratings and consolidated overall scores could have 
allowed us to explore this effect.
Additionally, as powerful as LLMs like GPT-3 may be, they are not immune to perpetuating 
the genuine societal biases that are embedded in their training data. This raises critical ethical 
considerations regarding the responsible deployment of such AI systems. The biases exhibited 
by language models like GPT-3 reflect a broader issue afflicting AI systems across domains. 
For instance, AI-powered image generation tools like Midjourney or OpenAI’s DALL-E also 
show such biases; when asked to generate an image of a CEO, they usually depict a white 
male, likely mirroring stereotypical patterns in their training data.
Conversely, in what seems like an over-correction driven by “wokism” interests to counteract 
this failure, Google’s Gemini model exhibits the opposite tendency. According to The 
Economist (2024), “[t]he tech giant’s new artificial-intelligence model invents black Vikings 
and Asian popes” in an apparent attempt to diversify representations.
217TRANSLATION STUDIES
Similarly, the problematic gender stereotype suggestion Gentle Like a Woman, Tough Like a 
Man from GPT-3 likely stems from ingrained biases present in its textual training corpora. 
The model essentially amplifies societal patterns it discovers in the data. This concern is not 
limited to language models alone. Studies have shown that facial recognition tools exhibit 
poorer performance on minority groups when trained on datasets overrepresenting majority 
demographics (Howard et al. 2022). Just as racial biases emerge in some computer vision 
systems, language models that internalize imbalanced representations can propagate gender 
and other biases.
Given the expanding global use of AI translation technologies, like e-commerce companies 
localizing for international audiences, the implications of circulating biased outputs could 
prove reputationally and financially detrimental (Zhang et al. 2021) to the companies 
deploying these systems and potentially harmful to the diverse consumer groups they aim to 
serve. More broadly, scholars like Bostrom (2014) and Harari (2018) have flagged threats of 
AI dominance across society. 
7 Findings Based on Student Feedback
Student participants offered crucial insights into using GPT-3 tools. Their feedback came 
through comments, tracking data, and audio messages, allowing triangulation across multiple 
qualitative sources. Surprisingly, despite both groups having equal opportunity to comment, 
most responses came from the students rather than the professionals. Students’ perspectives, 
derived from diverse feedback channels, inform some of this study’s key findings, including 
the following:
With proper training in leveraging these AI tools, student translators managed to surpass 
professionals in some key translation quality metrics. However, fully harnessing the potential 
of LLMs like GPT-3 relies wholly on human discernment and skill.
Students shared illuminating examples from working with the applications. One student was 
tasked with translating the VidEdu motto Easy and friendly learning from Persian to English. 
His initial plain prompt yielded uninspired suggestions like Simple and amicable education. 
However, after researching VidEdu’s fun, animated video-based courses, the student enhanced 
the prompt with vivid contextual details. This sparked creative alternatives like Engaging courses 
for the YouTube generation! and Making learning as fun as YouTube. Without added context, 
these suggestions remained lacklustre. But small tweaks adding colour significantly mproved 
results, though conciseness specifications still prevented selecting the most aesthetic options.
Additionally, students emphasized thoroughly researching branding tone and context when 
formulating prompts. For example, when translating the slogan for GentleTools, an outdoor 
toolbox company, students learned that energetic, rugged prompts yielded suggestions 
conveying durability like Built to endure the elements. However, more refined prompts 
generated incompatible suggestions alluding to luxury.
Elaborating further, students observed differences even across AI tools. When inputting the 
same GentleTools prompt into CopyAI versus Texta.ai, noticeably distinct suggestions emerged. 
218 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran
CopyAI proposed rugged slogans fitting the durable brand, such as Equipped for adventure. In 
contrast, Texta.ai generated refined suggestions alluding to luxury, like The finest instruments 
for the discerning craftsman. The reason these applications produce varying suggestions likely 
stems in part from differences in their settings. As discussed previously, while these tools share 
the GPT-3 foundation, factors like the AI engine (Curie, DaVinci, etc.) and hyperparameters 
such as temperature still differ, impacting text generation qualities. (Temperature refers to how 
deterministic or variable the responses can be (Tingiris 2021, 54–57).)
The different settings adopted by these applications, resulting in outputs with differing 
qualities, highlight the importance of translators or language service providers experimenting 
with different models and tools. Based on their unique settings and configurations, each 
application’s outputs can vary, potentially serving different purposes or dealing more effectively 
with some cases. This underscores the value of extensive experimentation to determine which 
tool aligns best with the intended messaging and branding goals for a particular project. The 
divergence between CopyAI and Texta.ai highlights why it is crucial for translators to explore 
various options, as each may offer distinct features, strengths, or specialized capabilities better 
suited to the task at hand. This principle extends beyond just specialized GPT-3 tools to 
encompass leveraging state-of-the-art LLM-powered chatbots like ChatGPT, Claude, or 
Google’s Gemini as well since each offers complementary functionalities that may prove 
advantageous in different scenarios.
By using distinct engines or settings optimized for certain styles, the tools produce noticeably 
divergent outputs. This tonal divergence between tools highlights the importance of selecting 
the application best aligned with intended messaging. In this case, CopyAI’s bold outdoor 
slogans resonated strongly with GentleTools’ desired branding. This example underscores 
the vital human role in judiciously steering these technologies based on branding needs and 
audience preferences.
8 Navigating the AI Translation Frontier: Prospects and 
Considerations
This study indicates that with proper training, student translators can effectively use GPT-3 
to improve marketing translation quality, surpassing professionals in adapting brand mottos 
after focused instruction. However, the small, specialized sample limits broad conclusions, 
as outliers or sampling quirks may have skewed the results. Future research should examine 
effects across diverse languages, text types, and evolving tools, and include consumer feedback 
through market testing to evaluate real-world responses to AI-assisted versus professional 
transcreations.
Unlike traditional machine translation, LLMs like GPT-3 are not solely designed for 
translation. Trained on vast, varied datasets, they support applications like question-
answering, post-editing, terminology extraction, and transcreation (Kenny 2022). This 
versatility, however, introduces risks such as hallucinations, biases, or logical inconsistencies, 
which can complicate critical translation tasks (Nimdzi Insights 2023).
The study’s competitive pre/post-test format, with timed lab sessions, remote proctoring, 
and peer competition, prioritized variable control but reduced ecological validity. Real-
219TRANSLATION STUDIES
world translation typically occurs independently, without surveillance or rigid constraints, 
limiting the applicability of findings to professional settings. Future studies in natural work 
environments could enhance authenticity.
Comparing students and professionals under different conditions risks conflating factors. 
Professionals had more experience but no GPT-3 training, while students faced academic 
pressure and peer competition. Despite this, the comparison offers value. It benchmarks 
student readiness against industry standards, highlighting gaps to bridge for career entry, 
especially under Iran’s revised curriculum emphasizing practical skills. Additionally, students’ 
post-test success underscores the need for professionals to pursue ongoing training to stay 
competitive amid advancing technologies.
By comparing groups and assessing specialized training, this study highlights the need to 
equip students with modern skills and encourage lifelong learning among professionals. As 
language technologies evolve, both groups must adapt.
The findings suggest GPT-3’s potential to enhance marketing translation when guided by 
skilled users through thoughtful prompting and experimentation. Prompt engineering and 
LLM literacy emerge as essential skills. While GPT-3 can generate locale-specific suggestions, 
its effectiveness depends on human direction, as seen in students’ tailored outputs. This 
reinforces the enduring need for human oversight, aligning with studies showing translators 
value tools that support their goals but struggle with inflexible technology (Ruokonen and 
Koskinen 2017).
Echoing Douglas Adams’ The Hitchhiker’s Guide to the Galaxy, where the Babel fish’s literal 
translations caused cultural misunderstandings, unchecked LLMs risk similar errors by 
amplifying biases or generating implausible content. Yet, when translators refine outputs 
through iterative prompting, combining human expertise with AI’s pattern recognition, 
they create a powerful synergy. This “augmented translation” approach – where AI handles 
repetitive tasks and offers creative options, while humans provide cultural insight and 
judgment – enhances outcomes. As technology and human expertise continue to integrate, 
translation workflows will likely embrace this collaborative model, delivering superior results 
through thoughtful partnership.
References
Bayer-Hohenwarter, Gerrit. 2011. “‘Creative shifts’ as a means of measuring and promoting translational 
creativity.” Meta 56 (3): 663–92. https://doi.org/10.7202/1008339ar.
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell.32 2021. “On the 
dangers of stochastic parrots: Can language models be too big?” In FAccT ‘21: Proceedings of the 2021 
ACM Conference on Fairness, Accountability, and Transparency, 610–23. Association for Computing 
Machinery.
Blumczynski, Piotr, and Ghodrat Hassani. 2019. “Towards a meta-theoretical model for translation: A 
multidimensional approach.” Target: International Journal of Translation Studies 31 (3): 328–51. 
https://doi.org/10.1075/target.17031.blu.
3 The author Margaret Mitchell intentionally used the pseudonym “Shmargaret Shmitchell” in this publication.
220 G. Hassani, M. Malekshahi, H. Davari AI-Powered Transcreation in Global Marketing: Insights from Iran
Bostrom, Nick. 2014. Superintelligence: Paths, Dangers, Strategies. Oxford University Press.
Bowker, Lynne. 2023. De-mystifying Translation: Introducing Translation to Non-translators. Routledge.
Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind 
Neelakantan, et al. 2020. “Language models are few-shot learners.” Advances in Neural Information 
Processing Systems 33: 1877–1901. https://doi.org/10.48550/arXiv.2005.14165.
The Business Research Company. 2025. AI in Language Translation Global Market Report 2025. https:// 
www.thebusinessresearchcompany.com/report/ai-in-language-translation-global-market-report.
Cronin, Michael. 2013. Translation in the Digital Age. Routledge.
Daems, Joke, and Lieve Macken. 2019. “Interactive adaptive SMT versus interactive adaptive NMT: A 
user experience evaluation.” Machine Translation 33 (1): 117–34. https://doi.org/10.1007/s10590-019 
-09230-z.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-training of deep 
bidirectional transformers for language understanding.” In Proceedings of NAACL-HLT 2019, 4171–
86. Association for Computational Linguistics. 
Díaz-Millón, Mar. 2021. “The role of transcreation in corporate communication: A case study in the US 
healthcare sector.” In Innovative Perspectives on Corporate Communication in the Global World, edited 
by María Dolores Olvera-Lobo, Juncal Gutiérrez-Artacho, and Irene Rivera-Trigueros, 159–76. IGI 
Global. 
Díaz-Millón, Mar, and María Dolores Olvera-Lobo. 2023. “Towards a definition of transcreation: A 
systematic literature review.” Perspectives 31 (2): 347–64. https://doi.org/10.1080/0907676X.2021.20 
04177.
Doherty, Stephen. 2017. “Issues in human and automatic translation quality assessment.” In Human Issues 
in Translation Technology, edited by Dorothy Kenny, 131–48. Routledge.
The Economist. 2024. “Is Google’s Gemini chatbot woke by accident, or by design?” The Economist, 
February 28. https://www.economist.com/united-states/2024/02/28/is-googles-gemini-chatbot-woke 
-by-accident-or-design.
Han, Chao. 2020. “A critical methodological review of translation quality assessment.” The Translator 26 
(3): 257–73. https://doi.org/10.1080/13556509.2020.1834751.
Harari, Yuval N. 2018. 21 Lessons for the 21st Century. Jonathan Cape.
Hassani, Ghodrat. 2011. “A corpus-based evaluation approach to translation improvement.” Meta 56 (2): 
351–73. https://doi.org/10.7202/1006181ar.
Ho, Nga-Ki Mavis. 2024. Appraisal and the Transcreation of Marketing Texts: Persuasion in Chinese and 
English. Routledge.
Howard, John J., Eli J. Laird, Rebecca E. Rubin, Yevgeniy B. Sirotin, and Jerry L. Tipton. 2022. 
“Evaluating proposed fairness models for face recognition algorithms.” In International Conference on 
Pattern Recognition, 431–47. Springer Nature Switzerland.
Katan, David. 2001. “When difference is not dangerous: Modelling intercultural competence for business.” 
Textus XIV (2): 287–306.
—. 2013. “Intercultural mediation.” In Handbook of Translation Studies, edited by Yves Gambier and Luc 
Van Doorslaer, 84–91. John Benjamins.
—. 2016. “Translation at the cross-roads: Time for the transcreational turn?” Perspectives 24 (3): 365–81. 
https://doi.org/10.1080/0907676X.2015.1016049.
—. 2018. “‘Translatere’ or ‘transcreare’: In theory and in practice and by whom?” In Translating and 
Interpreting Specific Texts, Contexts and Translation, edited by Cinzia Spinzi, Alessandra Rizzo and 
Marianna Lya Zummo, 139–60. University of Salento. 
Katan, David, and Mustapha Taibi. 2021. Translating Cultures: An Introduction for Translators, Interpreters 
and Mediators. 3rd ed. Routledge.
Kenny, Dorothy. 2022. Machine Translation for Everyone: Empowering Users in the Age of Artificial 
Intelligence. Language Science Press.
Kenny, Dorothy, and Stephen Doherty. 2014. “Statistical machine translation in the translation 
curriculum: Overcoming obstacles and empowering translators.” The Interpreter and Translator Trainer 
8 (2): 276–94. https://doi.org/10.1080/1750399X.2014.936112.
221TRANSLATION STUDIES
Landis, J. Richard, and Gary G. Koch. 1977. “The measurement of observer agreement for categorical 
data.” Biometrics 33 (1): 159–74. https://doi.org/10.2307/2529310.
Łukasik, Marek. 2024. “The future of the translation profession in the era of artificial intelligence: Survey 
results from Polish translators, translation trainers, and students of translation.” Lublin Studies in 
Modern Languages and Literature 48 (3): 25–39. https://doi.org/10.17951/lsmll.2024.48.3.25-39.
Marais, Kobus. 2021. “Complexity in translation studies.” In Handbook of Translation Studies, edited by 
Yves Gambier and Luc Van Doorslaer, 23–29. John Benjamins.
Marais, Kobus, and Rein Meylaerts. 2022. “Introduction.” In Exploring the Implications of Complexity 
Thinking for Translation Studies, edited by Kobus Marais and Rein Meylaerts, 1–6. Routledge.
Martin, J.R., and P.R.R. White. 2005. The Language of Evaluation: Appraisal in English. Palgrave 
Macmillan.
Mellinger, Christopher D. 2019. “Computer-assisted interpreting technologies and interpreter cognition: 
A product and process-oriented perspective.” Tradumàtica: Tecnologies de la Traducció 17: 33–44. 
https://doi.org/10.5565/rev/tradumatica.228.
Mitchell-Schuitevoerder, Rosemary. 2020. A Project-Based Approach to Translation Technology. Routledge.
Moorkens, Joss. 2020. “’A tiny cog in a large machine’: Digital Taylorism in the translation industry.” 
Translation Spaces 9 (1): 12–34. https://doi.org/10.1075/ts.00019.moo.
Nimdzi Insights. 2022. The Nimdzi 2022 Language Technology Atlas. https://www.nimdzi.com/nimdzi-lan 
guage-technology-atlas-2022/.
—. 2023. The Nimdzi 2023 Language Technology Atlas. https://www.nimdzi.com/language-technology-at 
las/.
—. 2024. The 2024 Nimdzi 100. https://www.nimdzi.com/nimdzi-100-2024/.
—. 2025. The 2025 Nimdzi 100. https://www.nimdzi.com/nimdzi-100-2025.
Olohan, Maeve. 2017. “Technology, translation and society: A constructivist, critical theory approach.” 
Target 29 (2): 264–83. https://doi.org/10.1075/target.29.2.04olo.
Pielmeier, Hélène, and Paul O’Mara. 2020. The State of the Linguist Supply Chain. CSA Research.
Rodríguez-Castro, Mónica. 2018. “An integrated curricular design for computer-assisted translation tools: 
Developing technical expertise.” The Interpreter and Translator Trainer 12 (4): 355–72. https://doi.org 
/10.1080/1750399X.2018.1502007.
Ruokonen, Minna, and Kaisa Koskinen. 2017. “Dancing with technology: Translators’ narratives on the 
dance of human and machinic agency in translation work.” The Translator 23 (3): 310–23. https://doi 
.org/10.1080/13556509.2017.1301846.
Tamkin, Alex, Miles Brundage, Jack Clark, and Deep Ganguli. 2023. “Understanding the capabilities, 
limitations, and societal impact of large language models.” arXiv: 1–8. https://doi.org/10.48550/arXiv 
.2102.02503.
Tingiris, Steve. 2021. Exploring GPT-3. Packt Publishing.
Torresi, Ira. 2010. Translating Promotional and Advertising Texts. Routledge.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz 
Kaiser, and Illia Polosukhin. 2017. “Attention is all you need.” Advances in Neural Information 
Processing Systems 30: 5998–6008. https://doi.org/10.48550/arXiv.1706.03762.
Way, Andy. 2020. “Machine translation: Where are we at today?” In The Bloomsbury Companion to 
Language Industry Studies, edited by Erik Angelone, Maureen Ehrensberger-Dow, and Gary Massey, 
311–32. Bloomsbury Academic.
Zhang, Daniel, Saurabh Mishra, Erik Brynjolfsson, John Etchemendy, Deep Ganguli, Barbara Grosz, 
Terah Lyons, James Manyika, Juan Carlos Niebles, Michael Sellitto, Yoav Shoham, Jack Clark, and 
Raymond Perrault. 2021. The AI Index 2021 Annual Report. AI Index Steering Committee, Human-
Centered AI Institute, Stanford University.

 List of  
Contributors
224
Aja Barbič
University of Maribor, Slovenia
aja.barbic@student.um.si
Mladen Borovič
University of Maribor, Slovenia
mladen.borovic@um.si
Marija Brala Vukanović
University of Rijeka, Croatia
marija.brala@ffri.uniri.hr
Hossein Davari
Damghan University, Iran
h.davari@du.ac.ir
Melanija Larisa Fabčič
University of Maribor, Slovenia
melanija.fabcic@um.si
Andrej Flogie
University of Maribor, Slovenia
andrej.flogie@um.si
Nataša Gajšt
University of Maribor, Slovenia
natasa.gajst@um.si
Daniel Hari
University of Maribor, Slovenia
daniel.hari@um.si
Ghodrat Hassani
Damghan University, Iran
q.hassani@du.ac.ir
Tommy Hastomo
Universitas Negeri Malang (State University of 
Malang), Indonesia
tomhas182@gmail.com
David Hazemali
University of Maribor, Slovenia
david.hazemali@um.si
 
Francisca Maria Ivone
Universitas Negeri Malang (State University of 
Malang), Indonesia
fransicamaria@um.ac.id
Eva Jakupčević
University of Split, Croatia
ejakupcevic@ffst.hr
Saša Jazbec
University of Maribor, Slovenia
sasa.jazbec@um.si
Muhammad Fikri Nugraha Kholid
Universitas Islam Negeri Raden Intan 
Lampung (Raden Intan State Islamic 
University Lampung), Indonesia
fikrikholid44@gmail.com
Agata Križan
University of Maribor, Slovenia
agata.krizan@um.si
Rashmika Lekamge
Sabaragamuwa University of Sri Lanka, Sri 
Lanka
rashmi@geo.sab.ac.lk
Bernarda Leva
University of Maribor, Slovenia
bernarda.leva@um.si
Marta Licardo
University of Maribor, Slovenia
marta.licardo@um.si
Marziyeh Malekshahi
Damghan University, Iran
m.malekshahi@du.ac.ir
Silvana Neshkovska
University “St. Kliment Ohridski”, Bitola, 
North Macedonia
silvana.neskovska@uklo.edu.mk
LIST OF CONTRIBUTORS 
225
Tomaž Onič
University of Maribor, Slovenia
tomaz.onic@um.si
Zmago Pavličič
University of Maribor, Slovenia
zmago.pavlicic@gmail.com
Bojan Prosenjak
University of Zagreb, Croatia
bprosenj@m.ffzg.hr
Andini Septama Sari
Universitas Negeri Malang (State University 
of Malang), Indonesia
andinisari@gmail.com
Clayton Smith
University of Windsor, Canada
clayton.smith@uwindsor.ca
Tadej Todorović
University of Maribor, Slovenia
tadej.todorovic@um.si
Utami Widiati
Universitas Negeri Malang (State University 
of Malang), Indonesia
utami.widiati@um.ac.id
Evynurul Laily Zen
Universitas Negeri Malang (State University 
of Malang), Indonesia
evynurullailyzen@um.ac.id
Simon Zupan
University of Maribor, Slovenia
simon.zupan@um.si
GUIDELINES FOR CONTRIBUTORS
ELOPE English Language Overseas Perspectives and Enquiries
ELOPE publishes original research articles, studies and essays that address matters pertaining to the 
English language, literature, teaching and translation.
Submission of Manuscripts
Manuscripts should be submitted for blind review in electronic form using the Faculty of Arts 
(University of Ljubljana) OJS platform (https://journals.uni-lj.si/elope/about/submissions). Only one 
contribution by the same author per volume will be considered. Each paper should be accompanied 
by abstracts in English and Slovene and keywords. Abstracts by non-native speakers of Slovene will 
be translated into Slovene by ELOPE. Please be sure to have a qualified native speaker proofread your 
English-language article. Suggested length of manuscripts is between 5,000 and 8,000 words.
Manuscript Style and Format
The manuscript should be in the following format:
•	 title in English (no longer than 100 characters including spaces),
•	 abstracts in English and Slovene (with a maximum of 150 words for each language) and (up to 
five) keywords,
•	 the text should be divided into introduction; body of the paper (possibly subdivided); and conclusion.
The text should be preferably written in Word format (OpenOffice and RTF files are also acceptable). 
Please observe the following:
•	 12-point Times New Roman font size,
•	 2.5 cm page margins on all sides,
•	 1.5-point line spacing,
•	 left text alignment,
•	 footnotes should be brief (up to 300 words per page; 10-point Times New Roman font size).
For resolving practical style and formatting queries, please see the articles in the latest on-line issue or 
contact the technical editor. 
References
References should comply with The Chicago Manual of Style (18th edition, 2024) author-date system. 
A Quick Guide to CMS is available here: https://www.chicagomanualofstyle.org/tools_citationguide/
citation-guide-2.html.
Final note
Please note that only manuscripts fully adhering to the ELOPE Guidelines for Contributors will be 
considered for publication.
ELOPE Vol. 22, No. 1 (2025)
Guest Editors
Tomaž Onič, University of Maribor, Slovenia  
David Hazemali, University of Maribor, Slovenia 
Mladen Borovič, University of Maribor, Slovenia 
Journal Editors
Smiljana Komar, University of Ljubljana, Slovenia
Mojca Krevel, University of Ljubljana, Slovenia
Editorial Board
Lisa Botshon, University of Maine at Augusta, United States of America; Biljana Čubrović, 
University of Belgrade, Serbia; Michael Devine, Acadia University, Canada; Dušan 
Gabrovšek, University of Ljubljana, Slovenia; Michelle Gadpaille, University of Maribor, 
Slovenia; Meta Grosman, University of Ljubljana, Slovenia; Allan James, University of 
Klagenfurt, Austria; Victor Kennedy, University of Maribor, Slovenia; Bernhard Kettemann, 
University of Graz, Austria; Alberto Lázaro, University of Alcalá de Henares, Spain; 
J. Lachlan Mackenzie, VU University Amsterdam, Netherlands; Tomaž Onič, University 
of Maribor, Slovenia; Roger D. Sell, Åbo Akademi University, Finland; Andrej Stopar, 
University of Ljubljana, Slovenia; Rick Van Noy, Radford University, United States of 
America; Terri-ann White, University of Western Australia, Australia
Editorial Secretary
Gašper Ilc, University of Ljubljana, Slovenia
Technical Editor
Andrej Stopar, University of Ljubljana, Slovenia
Proofreading
Michelle Gadpaille
Editorial Policy
ELOPE: English Language Overseas Perspectives and Enquiries is a double-blind, peer-
reviewed academic journal that publishes original research articles, studies and essays that 
address matters pertaining to the English language, literature, teaching and translation.
The journal promotes the discussion of linguistic and literary issues from theoretical 
and applied perspectives regardless of school of thought or methodology. Covering a 
wide range of issues and concerns, ELOPE aims to investigate and highlight the themes 
explored by contemporary scholars in the diverse fields of English studies.
Published by
University of Ljubljana Press
Založba Univerze v Ljubljani
For the Publisher: Gregor Majdič, Rector of the University of Ljubljana
Issued by
Slovene Association for the Study of English 
Slovensko društvo za angleške študije
Department of English, Faculty of Arts, University of Ljubljana
Oddelek za anglistiko in amerikanistiko, Filozofska fakulteta, Univerza v Ljubljani
Ljubljana University Press, Faculty of Arts
Znanstvena založba Filozofske fakultete Univerze v Ljubljani 
For the Issuer: Mojca Schlamberger Brezar,  
Dean of the Faculty of Arts, University of Ljubljana
The journal is published with support from the Slovenian Research and Innovation Agency.
The publication is free of charge.
Universal Decimal Classification (UDC)
Kristina Pegan Vičič
Journal Design
Gašper Mrak
Cover
Marjan Pogačnik: Zimsko cvetje, 1994
7.6 x 10.0 cm; colour etching, deep relief
Owner: National Gallery, Ljubljana,  
Photo: Bojan Salaj, National Gallery, Ljubljana
Printed by
Birografika Bori
Number of Copies
110
https://doi.org/10.4312/elope.22.1
Online ISSN: 2386-0316
Print ISSN: 1581-8918
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.