9. oktober 2025 l 9 October 2025 Ljubljana, Slovenia IS 2025 INFORMACIJSKA DRUZBA ˇ INFORMATION SOCIETY Kognitivna znanost Cognitive Science Zbornik 28. mednarodne Uredniki l Editors: multikonference Anka Slana Ozimič, Borut Trpin, Toma Strle Zvezek B Proceedings of the 28th International Multiconference Volume B Zbornik 28. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2025 Zvezek B Proceedings of the 28th International Multiconference INFORMATION SOCIETY – IS 2025 Volume B Kognitivna znanost Cognitive Science Uredniki / Editors Anka Slana Ozimič, Borut Trpin, Toma Strle http://is.ijs.si 9. oktober 2025 / 9 October 2025 Ljubljana, Slovenia Uredniki: Anka Slana Ozimič Filozofska fakulteta, Univerza v Ljubljani Borut Trpin Filozofska fakulteta, Univerza v Ljubljani Toma Strle Center za kognitivno znanost, Pedagoška fakulteta, Univerza v Ljubljani Založnik: Institut »Jožef Stefan«, Ljubljana Priprava zbornika: Mitja Lasič, Vesna Lasič, Lana Zemljak Oblikovanje naslovnice: Vesna Lasič, uporabljena slika iz Pixabay Dostop do e-publikacije: http://library.ijs.si/Stacks/Proceedings/InformationSociety Ljubljana, oktober 2025 Informacijska družba ISSN 2630-371X DOI: https://doi.org/10.70314/is.2025.cs Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni knjižnici v Ljubljani COBISS.SI-ID 255439107 ISBN 978-961-264-320-1 (PDF) PREDGOVOR MULTIKONFERENCI INFORMACIJSKA DRUŽBA 2025 28. mednarodna multikonferenca Informacijska družba se odvija v času izjemne rasti umetne inteligence, njenih aplikacij in vplivov na človeštvo. Vsako leto vstopamo v novo dobo, v kateri generativna umetna inteligenca ter drugi inovativni pristopi oblikujejo poti k superinteligenci in singularnosti, ki bosta krojili prihodnost človeške civilizacije. Naša konferenca je tako hkrati tradicionalna znanstvena in akademsko odprta, pa tudi inkubator novih, pogumnih idej in pogledov. Letošnja konferenca poleg umetne inteligence vključuje tudi razprave o perečih temah današnjega časa: ohranjanje okolja, demografski izzivi, zdravstvo in preobrazba družbenih struktur. Razvoj UI ponuja rešitve za številne sodobne izzive, kar poudarja pomen sodelovanja med raziskovalci, strokovnjaki in odločevalci pri oblikovanju trajnostnih strategij. Zavedamo se, da živimo v obdobju velikih sprememb, kjer je ključno, da z inovativnimi pristopi in poglobljenim znanjem ustvarimo informacijsko družbo, ki bo varna, vključujoča in trajnostna. V okviru multikonference smo letos združili dvanajst vsebinsko raznolikih srečanj, ki odražajo širino in globino informacijskih ved: od umetne inteligence v zdravstvu, demografskih in družinskih analiz, digitalne preobrazbe zdravstvene nege ter digitalne vključenosti v informacijski družbi, do raziskav na področju kognitivne znanosti, zdrave dolgoživosti ter vzgoje in izobraževanja v informacijski družbi. Pridružujejo se konference o legendah računalništva in informatike, prenosu tehnologij, mitih in resnicah o varovanju okolja, odkrivanju znanja in podatkovnih skladiščih ter seveda Slovenska konferenca o umetni inteligenci. Poleg referatov bodo okrogle mize in delavnice omogočile poglobljeno izmenjavo mnenj, ki bo pomembno prispevala k oblikovanju prihodnje informacijske družbe. »Legende računalništva in informatike« predstavljajo domači »Hall of Fame« za izjemne posameznike s tega področja. Še naprej bomo spodbujali raziskovanje in razvoj, odličnost in sodelovanje; razširjeni referati bodo objavljeni v reviji Informatica, s podporo dolgoletne tradicije in v sodelovanju z akademskimi institucijami ter strokovnimi združenji, kot so ACM Slovenija, SLAIS, Slovensko društvo Informatika in Inženirska akademija Slovenije. Vsako leto izberemo najbolj izstopajoče dosežke. Letos je nagrado Michie-Turing za izjemen življenjski prispevek k razvoju in promociji informacijske družbe prejel Niko Schlamberger, priznanje za raziskovalni dosežek leta pa Tome Eftimov. »Informacijsko limono« za najmanj primerno informacijsko tematiko je prejela odsotnost obveznega pouka računalništva v osnovnih šolah. »Informacijsko jagodo« za najboljši sistem ali storitev v letih 2024/2025 pa so prejeli Marko Robnik Šikonja, Domen Vreš in Simon Krek s skupino za slovenski veliki jezikovni model GAMS. Iskrene čestitke vsem nagrajencem! Naša vizija ostaja jasna: prepoznati, izkoristiti in oblikovati priložnosti, ki jih prinaša digitalna preobrazba, ter ustvariti informacijsko družbo, ki koristi vsem njenim članom. Vsem sodelujočim se zahvaljujemo za njihov prispevek — veseli nas, da bomo skupaj oblikovali prihodnje dosežke, ki jih bo soustvarjala ta konferenca. Mojca Ciglarič, predsednica programskega odbora Matjaž Gams, predsednik organizacijskega odbora i FOREWORD TO THE MULTICONFERENCE INFORMATION SOCIETY 2025 The 28th International Multiconference on the Information Society takes place at a time of remarkable growth in artificial intelligence, its applications, and its impact on humanity. Each year we enter a new era in which generative AI and other innovative approaches shape the path toward superintelligence and singularity — phenomena that will shape the future of human civilization. The conference is both a traditional scientific forum and an academically open incubator for new, bold ideas and perspectives. In addition to artificial intelligence, this year’s conference addresses other pressing issues of our time: environmental preservation, demographic challenges, healthcare, and the transformation of social structures. The rapid development of AI offers potential solutions to many of today’s challenges and highlights the importance of collaboration among researchers, experts, and policymakers in designing sustainable strategies. We are acutely aware that we live in an era of profound change, where innovative approaches and deep knowledge are essential to creating an information society that is safe, inclusive, and sustainable. This year’s multiconference brings together twelve thematically diverse meetings reflecting the breadth and depth of the information sciences: from artificial intelligence in healthcare, demographic and family studies, and the digital transformation of nursing and digital inclusion, to research in cognitive science, healthy longevity, and education in the information society. Additional conferences include Legends of Computing and Informatics, Technology Transfer, Myths and Truths of Environmental Protection, Knowledge Discovery and Data Warehouses, and, of course, the Slovenian Conference on Artificial Intelligence. Alongside scientific papers, round tables and workshops will provide opportunities for in-depth exchanges of views, making an important contribution to shaping the future information society. Legends of Computing and Informatics serves as a national »Hall of Fame« honoring outstanding individuals in the field. We will continue to promote research and development, excellence, and collaboration. Extended papers will be published in the journal Informatica, supported by a long-standing tradition and in cooperation with academic institutions and professional associations such as ACM Slovenia, SLAIS, the Slovenian Society Informatika, and the Slovenian Academy of Engineering. Each year we recognize the most distinguished achievements. In 2025, the Michie-Turing Award for lifetime contribution to the development and promotion of the information society was awarded to Niko Schlamberger, while the Award for Research Achievement of the Year went to Tome Eftimov. The »Information Lemon« for the least appropriate information-related topic was awarded to the absence of compulsory computer science education in primary schools. The »Information Strawberry« for the best system or service in 2024/2025 was awarded to Marko Robnik Šikonja, Domen Vreš and Simon Krek together with their team, for developing the Slovenian large language model GAMS. We extend our warmest congratulations to all awardees. Our vision remains clear: to identify, seize, and shape the opportunities offered by digital transformation, and to create an information society that benefits all its members. We sincerely thank all participants for their contributions and look forward to jointly shaping the future achievements that this conference will help bring about. Mojca Ciglarič, Chair of the Program Committee Matjaž Gams, Chair of the Organizing Committee ii KONFERENČNI ODBORI CONFERENCE COMMITTEES International Programme Committee Organizing Committee Vladimir Bajic, South Africa Matjaž Gams, chair Heiner Benking, Germany Mitja Luštrek Se Woo Cheon, South Korea Lana Zemljak Howie Firth, UK Vesna Koricki Olga Fomichova, Russia Mitja Lasič Vladimir Fomichov, Russia Blaž Mahnič Vesna Hljuz Dobric, Croatia Alfred Inselberg, Israel Jay Liebowitz, USA Huan Liu, Singapore Henz Martin, Germany Marcin Paprzycki, USA Claude Sammut, Australia Jiri Wiedermann, Czech Republic Xindong Wu, USA Yiming Ye, USA Ning Zhong, USA Wray Buntine, Australia Bezalel Gavish, USA Gal A. Kaminka, Israel Mike Bain, Australia Michela Milano, Italy Derong Liu, Chicago, USA Toby Walsh, Australia Sergio Campos-Cordobes, Spain Shabnam Farahmand, Finland Sergio Crovella, Italy Programme Committee Mojca Ciglarič, chair Marjan Heričko Boštjan Vilfan Bojan Orel Borka Jerman Blažič Džonova Baldomir Zajc Franc Solina Gorazd Kandus Blaž Zupan Viljan Mahnič Urban Kordeš Boris Žemva Cene Bavec Marjan Krisper Leon Žlajpah Tomaž Kalin Andrej Kuščer Niko Zimic Jozsef Györkös Jadran Lenarčič Rok Piltaver Tadej Bajd Borut Likar Toma Strle Jaroslav Berce Janez Malačič Tine Kolenik Mojca Bernik Olga Markič Franci Pivec Marko Bohanec Dunja Mladenič Uroš Rajkovič Ivan Bratko Franc Novak Borut Batagelj Andrej Brodnik Vladislav Rajkovič Tomaž Ogrin Dušan Caf Grega Repovš Aleš Ude Saša Divjak Ivan Rozman Bojan Blažica Tomaž Erjavec Niko Schlamberger Matjaž Kljun Bogdan Filipič Gašper Slapničar Robert Blatnik Andrej Gams Stanko Strmčnik Erik Dovgan Matjaž Gams Jurij Šilc Špela Stres Mitja Luštrek Jurij Tasič Anton Gradišek Marko Grobelnik Denis Trček Nikola Guid Andrej Ule iii iv KAZALO / TABLE OF CONTENTS Kognitivna znanost / Cognitive Science ....................................................................................................... 1 PREDGOVOR / FOREWORD ............................................................................................................................... 3 PROGRAMSKI ODBORI / PROGRAMME COMMITTEES ............................................................................... 5 Rehabilitacija roke z robotsko podprto obravnavo pri otrocih in mladostnikih več let po akutno nastali možganski okvari / Bregant Tina, Pavlinič Renata, Šinkovec Patricija ............................................................ 7 Staring, Guessing, and Imagining: Strategies in Visual Working Memory / Bušelič Benjamin, Purg Suljič Nina, Jablanovec Andrej, Repovš Grega, Slana Ozimič Anka .................................................................................. 11 Machine Bias: New Experiments With COMPAS Data / Farič Ana, Bratko Ivan .............................................. 15 Primerjava lastnosti človeške kognicije in umetne inteligence / Jamšek Monika, Smodiš Rok, Jordan Marko, Gams Matjaž .................................................................................................................................................... 21 Coherentist Echo Chambers / Justin Martin, Trpin Borut .................................................................................... 28 Large Language Models for Psychiatric Interview Analysis: An Exploratory Pilot Study / Lodrant Katarina, Melinščak Filip, Beris Ayse Nur, Schneider Valentin, Czernin Klara, Bangerl Waltraud, Bründlmayer Anselm, Scharnowski Frank, Laczkovics Clarissa, Steyrl David .................................................................... 32 Passing the Turing Test, Failing Consciousness: Why LLMs Remain Non-Conscious / Mono Louis ................ 37 Building an Ontology of the Self: Sense of Agency and Bodily Self / Oprešnik Luka, Križan Tia, Caporusso Jaya ................................................................................................................................................................... 41 Modeling Nonlinear Change in Psychotherapy: Toward an AI Decision-Support System With Synthetic Client Data / Šonc Oskar, Smodiš Rok, Kolenik Tine, Schiepek Günter, Aichhorn Wolfgang ................................ 48 What Words Reveal About Mental Health: A Computational Language Analysis Around Phase Transitions in Psychotherapy / Šutar Mateja, Kolenik Tine, Schiepek Günter, Aichhorn Wolfgang .................................... 52 Measuring Therapist–Client Synchrony to Forecast Change Dynamics: EMA-based Protocol Pilot / Vajda Matej, Kolenik Tine, Rožič Tatjana, Kovačević Tojnko Nuša, Slapničar Gašper, Možina Miran, Schiepek Günter, Aichhorn Wolfgang ............................................................................................................................. 56 Towards a Possible Solution of Chalmers’ Hard Problem and to Definitions of Life and Consciousness / Vitas Marko ............................................................................................................................................................... 61 Analiza kognitivnih zmogljivosti LLM: Strateško načrtovanje z uporabo testa Tower of London / Žužek Katarina, Gams Matjaž ..................................................................................................................................... 63 Indeks avtorjev / Author index ................................................................................................................... 67 v vi Zbornik 28. mednarodne multikonference INFORMACIJSKA DRUŽBA – IS 2025 Zvezek B Proceedings of the 28th International Multiconference INFORMATION SOCIETY – IS 2025 Volume B Kognitivna znanost Cognitive Science Uredniki / Editors Anka Slana Ozimič, Borut Trpin, Toma Strle http://is.ijs.si 9. oktober 2025 / 9 October 2025 Ljubljana, Slovenia 1 2 PREDGOVOR Dobrodošli na letošnji konferenci Kognitivna znanost v okviru multikonference Informacijska družba. Konferenca tudi letos združuje raziskovalce in raziskovalke, ki jih povezuje zanimanje za kognitivne procese in njihovo umeščenost v širši naravni in družbeni kontekst. Kognitivna znanost je interdisciplinarno raziskovalno polje, ki povezuje filozofijo, psihologijo, nevroznanost, lingvistiko, računalništvo, umetno inteligenco in sorodne discipline. Prav na presečišču različnih pristopov nastajajo nova vprašanja, metode in rešitve, ki bogatijo razumevanje kognicije in odpirajo pot k inovativnim aplikacijam. Tudi letošnji program odraža to raznolikost. Filozofski prispevki se lotevajo temeljnih vprašanj zavesti, življenja in t. i. težkega problema; drugi se posvečajo socialni epistemologiji. Več raziskav je namenjenih velikim jezikovnim modelom: njihovi kognitivni zmogljivosti, vlogi v analizi psihiatričnih intervjujev ter razmerju med uspešnim jezikovnim vedenjem in odsotnostjo zavesti. Empirični in aplikativni prispevki obravnavajo rehabilitacijo z robotsko podporo, spremljanje faznih prehodov v psihoterapiji, sinhronijo med terapevtom in klientom, gradnjo ontologij sebstva ter raziskovanje strategij v delovnem spominu. Tak nabor tem potrjuje, da kognitivna znanost v Sloveniji in širše ostaja živahno raziskovalno polje, ki se nenehno odpira novim izzivom. Posebno mesto ima plenarno predavanje red. prof. dr. Olge Markič, ene osrednjih osebnosti pri razvoju kognitivne znanosti v Sloveniji. Njeno delo je pomembno prispevalo k uveljavitvi interdisciplinarnega pristopa in k oblikovanju raziskovalne skupnosti, ki jo danes soustvarjamo. Del programa je tudi okrogla miza o zaupanju. Gre za temo, ki presega meje posameznih disciplin in se dotika tako epistemologije in etike kot psihologije, sociologije ter raziskav umetne inteligence. Zaupanje je ključen pogoj za znanstveno sodelovanje, za delovanje družbenih institucij in za odgovorno uporabo novih tehnologij. Konferenca Kognitivna znanost 2025 ostaja prostor srečevanja in dialoga med raziskovalkami in raziskovalci različnih disciplin in generacij. Upamo, da bo tudi tokrat spodbudila plodno izmenjavo idej, oblikovanje novih sodelovanj ter skupno refleksijo o prihodnjih poteh raziskovanja kognicije. Dobrodošli! Anka Slana Ozimič Borut Trpin Toma Strle 3 FOREWORD Welcome to this year’s Cognitive Science conference, held within the multiconference Information Society. Once again, the conference brings together researchers who share an interest in cognitive processes and their place within the broader natural and social context. Cognitive science is an interdisciplinary research field that integrates philosophy, psychology, neuroscience, linguistics, computer science, artificial intelligence, and related disciplines. It is precisely at the intersection of these diverse approaches that new questions, methods, and solutions emerge, enriching our understanding of cognition and opening the way to innovative applications. This year’s program reflects this diversity. Philosophical contributions address fundamental questions concerning consciousness, life, and the so-called “hard problem”; others focus on issues in social epistemology. Several papers investigate large language models: their cognitive capacities, their role in the analysis of psychiatric interviews, and the relation between successful linguistic performance and the absence of consciousness. Empirical and applied contributions deal with robot-assisted rehabilitation, the monitoring of phase transitions in psychotherapy, therapist–client synchrony, the construction of ontologies of the self, and the study of strategies in working memory. Taken together, these contributions demonstrate that cognitive science in Slovenia and beyond remains a dynamic field of research, continuously opening itself to new challenges. A special place is reserved for the keynote lecture by Prof. Olga Markič, one of the central figures in the development of cognitive science in Slovenia. Her work has significantly contributed to the establishment of the interdisciplinary approach and to the formation of the research community of which we are part today. The program also includes a round table on trust. This theme transcends disciplinary boundaries and touches upon epistemology and ethics as well as psychology, sociology, and research on artificial intelligence. Trust is a crucial condition for scientific collaboration, for the functioning of social institutions, and for the responsible use of new technologies. The Cognitive Science 2025 conference continues to serve as a venue for encounters and dialogue among researchers from different disciplines and generations. We hope that this year’s meeting will once again stimulate fruitful exchanges of ideas, foster new collaborations, and inspire collective reflection on the future directions of cognitive science. Welcome! Anka Slana Ozimič Borut Trpin Toma Strle 4 PROGRAMSKI ODBOR / PROGRAMME COMMITTEE Anka Slana Ozimič, Filozofska fakulteta, Univerza v Ljubljani Borut Trpin, Filozofska fakulteta, Univerza v Ljubljani Toma Strle, Center za kognitivno znanost, Pedagoška fakulteta, Univerza v Ljubljani Olga Markič, Filozofska fakulteta, Univerza v Ljubljani; Urška Martinc, Center za kognitivno znanost, Pedagoška fakulteta, Univerza v Ljubljani 5 6 Rehabilitacija roke z robotsko podprto obravnavo pri otrocih in mladostnikih več let po akutno nastali možganski okvari Tina Bregant † Renata Pavlinič Patricija Šinkovec CIRIUS Kamnik CIRIUS Kamnik CIRIUS Kamnik Slovenija Slovenija Slovenija tina.bregant@cirius-kamnik.si renata.pavlinic@cirius-kamnik.si patricija.sinkovec@cirius- kamnik.si Povzetek kombinacije obeh pristopov, saj s tem pridobimo izboljšanje tako grobe motorike kot fine motorike. Izhodišča: Pri povrnitvi funkcije rok sta pomembni reorganizacija in plastičnost možganske skorje ter Ključne besede kortikospinalne proge. Za spodbujanje in modulacije nevronske Roka, zgornji ud, motorična skorja, kortikospinalna proga, intervencije s ponavljajočo se v cilj usmerjeno intenzivno plastičnost, delovna terapija, robotska rokavica plastičnosti uporabljamo rehabilitacijske strategije: zgodnje terapijo (motorični trening, trening z omejevanjem, robotski trening), kar pripomore k boljšemu okrevanju in povrnitvi funkcije roke. Cilji: Ugotoviti vpliv delovne terapije z robotsko podprto 1 Uvod obravnavo v primerjavi z vplivi klasičnih pristopov delovne Funkcija roke je ključnega pomena za ohranjanje samostojnosti terapije na funkcijo roke. in skrbi zase pri dnevnih aktivnostih. Zato je obnovljena oz. povrnjena funkcija roke pogosto eden najpomembnejših ciljev za Metode: V 4-tedensko raziskavo je bilo vključenih 32 otrok in bolnike z možgansko okvaro [1]. Funkcija rok se običajno po mladostnikov (od tega 15 žensk) z okvarjeno funkcijo zgornjega uda zaradi akutno nastale možganske okvare pred nekaj leti. V možganski poškodbi izboljšuje počasi, najbolj pogosto šele za eksperimentalni skupini z robotskim treningom je bilo 9 žensk in izboljšanjem funkcije trupa in spodnjih udov; najkasneje se 7 moških, s povprečno starostjo 17,9 let; v kontrolni skupini povrnejo finomotorične spretnosti, kjer sodelujejo drobne mišice (standardna delovna terapija) je bilo 6 žensk in 10 moških, s rok. Do 80 % preživelih odraslih po možganski kapi ima okvare povprečno starostjo 16,85 let. Okvare so nastale večinoma v področju zgornjih udov, le redki dosežejo popolno perinatalno – ob rojstvu oziroma v prvih tednih po rojstvu. Za funkcionalno okrevanje po 6 mesecih po možganski kapi [2]. ocenjevanje funkcije roke smo uporabili standardne instrumente Zato je izguba funkcije zgornjih udov (rok) eden od dejavnikov, ocenjevanja funkcije roke (ARAT, Box&Blocks, mišična moč). ki prispevajo k zmanjšanju splošne kakovosti življenja, kar izboljšanja funkcije rok uporabili deskriptivno statistiko. V analizi smo zaradi majhnega vzorca in boljše povednosti glede pomembno vpliva na dnevne aktivnosti, družabne aktivnosti ter pri odraslih vrnitev k poklicu. Pri otrocih in mladostnikih pa so Rezultati: Po zaključku terapij so rezultati ocenjevanja testa za uspešnost v šoli pomembne finomotorične spretnosti, ARAT in merjenja mišične moči v zgornjih udih pokazali večji koordinacija oko-oko in oko-roka ter grafomotorične spretnosti, napredek pri eksperimentalni skupini. Pri testu Box&Blocks pa ki so po okvarah možganov lahko pomembno okrnjene. je boljše rezultate dosegla kontrolna skupina. Po zaključenih terapijah so otroci in mladostniki podali subjektivno mnenje o Obseg motorične okvare med akutno ishemično možgansko zadovoljstvu glede terapij. kapjo je odvisen predvsem od obsega in integritete Zaključki: Tako klasični pristopi delovne terapije kot robotsko kortikospinalnega trakta, ki je bil poškodovan. Edino za podprte obravnave pomembno vplivajo na izboljšanje funkcije območja, kjer je kortikospinalni trakt zelo zgoščen zgornjih udov tudi nekaj let po nastali okvari. V skupini z (komprimiran), kot je npr. v področju ponsa, je korelacija med robotsko napravo smo dosegli večji napredek na področju fine motorike. V skupini s klasičnimi pristopi je bil večji napredek na motorično okvaro in velikostjo ishemične lezije majhna. Pri področju grobe motorike. Pri obeh terapijah je prevladovalo bolnikih z bolj ohranjeno integriteto kortikospinalnega trakta je zadovoljstvo z njimi. Ugotavljamo, da je smiselna uporaba izboljšanje po akutni ishemiji boljše, rehabilitacija pa uspešnejša. Kortikospinalna proga (imenovana tudi piramidna proga) je snop vlaken, ki povezuje možgansko skorjo s hrbtenjačo in omogoča Permission to make digital or hard copies of part or all of this work for personal or hoteno gibanje udov. Večina vlaken (75–90 %) prestopi na classroom use is granted without fee provided that copies are not made or nasprotno stran v podaljšani hrbtenjači (t. i. križanje piramidne distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of proge) ter se končuje v mišičnih skupinah udov [3]. Pri otrocih this work must be honored. For all other uses, contact the owner/author(s). živčevje in živčne povezave še zorijo; spretnosti s področja grobe Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia in fine motorike zorijo in šele nekaj let po rojstvu se razvijejo © 2025 Copyright held by the owner/author(s). odrasli vzorci gibanja. Zato je posledice perinatalne poškodbe nemogoče v celoti presojati takoj po dogodku; s terapevtskimi postopki nadaljujemo tudi nekaj let po nastali okvari. 7 2 lahko podpremo plastično reorganizacijo nitja kortikospinalne Ocena funkcije roke Pri funkciji zgornjih udov sta pomembna mišična moč in proge [9]. S pomočjo tehnologije okrepimo mehanizme gibljivost v sklepih. Moč prijema in stiska roke (pesti) lahko biomehanske povratne zanke (t. i. biofeedback) [10]. S tem povečamo dotok informacij glede gibanja, kar presega merimo z dinamometrom; obsege gibljivosti pa z goniometrom (kotomerom). Obstaja več lestvic za oceno delovanja roke. informacije, ki so sicer na voljo, in so lahko v nasprotju s senzoričnimi (ali notranjimi) povratnimi informacijami [11], saj Nobeden od testov ni tako univerzalen oz. ne pokriva vseh z robotskim gibanjem pravilnejše gibe “vsiljujemo”. Povečanje področij, da bi ga lahko enoznačno uporabljali, ne glede na povratnih informacij o gibanju ima večje klinične učinke kot patologijo roke. Roka je namreč tako kompleksna, da v svoji senzorične povratne informacije. Spodbudi tudi nevronsko moč, občutljivost, natančnost, spretnost in koordinacijo, plastičnost po poškodbi možganov [12]. Robotska naprava polni funkciji zahteva anatomsko integriteto, gibljivost, mišično omogoča usposabljanje bolnikov na intenziven, k nalogam specifične grobe in finomotorične veščine (prijeme), soročnost, usmerjen način terapije od zgoraj navzdol, kar povečuje a hkrati tudi dober kognitivni nadzor. Pri oceni funkcije roke smo z meritvami nekoliko omejeni in zato še dodatno ocenjujemo skladnost in motivacijo bolnikov. Kognitivna stimulacija od zgoraj navzdol se omogoča z uvedbo vizualnih povratnih uspešnost in hitrost izvajanja praktičnih nalog [4]. informacij, izvedenih z igranjem posebnih iger, česar se tudi poslužujemo [13]. Z robotom lahko dodatno preko strojnega Mišično moč v zgornjih udih merimo z dinamometrom. učenja tudi optimiziramo zahtevani vzorec gibanja. Zato je Praviloma z meritvijo poskušamo objektivizirati napredek oz. kompleksnost motorične naloge mogoče z robotiko natančneje spremembe v mišični moči v zgornjih udih. Z vsako roko praviloma opravimo tri zaporedne meritve, kot končni rezultat pa nadzorovati kot s konvencionalnimi pristopi zdravljenja. upoštevamo povprečje. Rezultati se beležijo v merski enoti Raziskave kažejo, da z novimi rehabilitacijskimi protokoli lahko kilogram (kg). Testi, ki so standardizirani in jih lahko vseeno dosežemo motorično izboljšanje tudi kasneje, celo še leto uporabljamo za oceno funkcije roke, so npr. test devetih zatičev (angl. nine hole peg test, NHPT, ki je bil razvit za vrednotenje dni po dogodku, ki pa ni tako izrazito kot na začetku [14, 15]. spretnosti prstov – fino ročno spretnost), test škatle in kock (angl. Med takšne programe se uvrščata: terapija z omejevanjem oz. z box and blocks test, BBT), ki določa grobo motoriko in omejevanjem spodbujajoča terapija (CIMT) [16] in robotski trening [15, 16]. lateralizacijo, Jebsenov test (angl. Jebsen-Taylor hand function test, JTHFT) za oceno lateralizacije, grobe in fine motorike, test Z omejevanjem spodbujajoča terapija – CIMT, pri kateri ARAT (angl. action research arm test) in sistema razvrščanja, kot omejimo funkcijo neprizadete roke, se izkazuje v intenzivni sta: lestvica BFMF (angl. bimanual fine motor function) ali rehabilitacijski obravnavi kot zelo koristna, čeprav je lahko tudi lestvica MACS (angl. manual ability classification system), s frustrirajoča. Ta terapevtski postopek spodbuja funkcijo katerima lahko razvrstimo otroke s cerebralno paralizo glede na okvarjenega zgornjega uda med izvajanjem različnih aktivnosti. njihove sposobnosti rokovanja s predmeti pri dnevnih S tem spodbuja procese plastičnosti in reorganizacije možganov aktivnostih. Za oceno celotnega stanja je uporabna lestvica ter tako prispeva k izboljšanju funkcije okvarjenega zgornjega funkcijske neodvisnosti (angl. functional independence measure, uda [4]. FIM) [4]. 3 5 Raziskava Povrnitev funkcije roke Spastičnost (v 20–40 %) in šibkost (spastična pareza) sta najbolj pogosti težavi po možganski poškodbi [5, 6]. Okrevanje v smislu Metode povrnitve moči in normaliziranja tonusa ter s tem tudi motorične V raziskavo je bilo vključenih 32 otrok in mladostnikov s funkcije, pripisujemo zlasti hitri reorganizaciji (plastičnosti) prisotno okvarjeno funkcijo zgornjega uda zaradi možganske korteksa in kortikospinalne proge, medtem ko neugodna okvare, ki je nastala perinatalno (v prvih tednih po rojstvu oz. ob plastičnost in pretirana vzdražnost retikulospinalne proge rojstvu). V eksperimentalno skupino je bilo vključenih 9 žensk najverjetneje povzročata največ težav. Spodbude in modulacija ter 7 moških, povprečna starost je bila 17,9 let. V kontrolno nevronske plastičnosti z rehabilitacijskimi strategijami, kot so skupino je bilo vključenih 6 žensk in 10 moških, povprečna zgodnje intervencije s ponavljajočo se ciljno usmerjeno starost je bila 16,85. Raziskava je potekala strnjeno 4 tedne. intenzivno terapijo (npr. motorični trening), ustrezna Zanimal nas je vpliv intenzivnih terapij podprtih z robotsko neinvazivna možganska stimulacija (npr. nevromodulacija s napravo v primerjavi s terapijami, ki vključujejo klasične transkranialno stimulacijo) in farmakološka sredstva (vključno z delovno-terapevtske pristope na funkcijo zgornjih udov ter apliciranjem toksina botulinum lokalno), so ključ do primerjava rezultatov. funkcionalnega motoričnega okrevanja [7]. Sinaptične povezave v osrednjem živčevju so plastične, kar pomeni, da jih je mogoče Pred pričetkom intenzivnih terapij so bila izvedena ocenjevanja spremeniti na podlagi učenja [8]. (ARAT, Box&Blocks ter merjenje mišične moči z dinamometrom). Vse oblike terapij so potekale 3 krat tedensko. Terapije z robotsko napravo Syrebo so se izvajale 20 minut, 4 (Re)habilitacija klasične delovno-terapevtske obravnave pa 30 minut. Terapije z V rehabilitaciji se pogosto osredotočamo na boljšo sklepno robotsko napravo so vključevale pasivno razgibavanje, vaje proti gibljivost, večjo moč in boljšo funkcijo. Z robotsko pomočjo uporu, aktivne vaje ter funkcionalne vaje. Na okvarjen zgornji ud 8 smo namestili robotsko rokavico. Naprava nam je omogočala izbiro med različnimi programi in funkcijami. Na osnovi različnih programov se je zahtevnost terapij tedensko stopnjevala. Klasične delovno-terapevtke obravnave so bile razdeljene v tri dele: 10 minut pasivnega sproščanja ramenskega obroča, predel nadlahti, podlahti in zapestja, 10 minut terapevtsko kolo in 10 minut primerne aktivnosti usmerjene na funkcijo rok (HomeClinico, aktivnosti na Movi mizi, aktivnosti za izboljšanje mišične moči in obseg gibanja…). Vrste aktivnosti ter intenzivnost se je individualno prilagajalo vsakemu posamezniku. Pri obeh skupinah so bila po končanih intenzivnih Slika 2: Rezultati testiranja mišične moči zgornjih udov z terapijah ponovno opravljena testiranja ARAT, Box&Blocks ter dinamometrom v eksperimentalni in kontrolni skupini pred merjenje mišične moči z dinamometrom. Udeleženci so izpolnili in po izvedbi intenzivnih obravnav. tudi nestandardiziran vprašalnik o zadovoljstvu. Po zaključenih terapijah smo izvedli nestandardiziran vprašalnik V analizi smo zaradi majhnega vzorca in boljše povednosti glede o zadovoljstvu. Večina pozitivnih odzivov nakazuje, da ima izboljšanja funkcije rok uporabili deskriptivno statistiko. eksperimentalni pristop z uporabo robotske rokavice pozitiven vpliv na izboljšanje grobe motorike in senzoričnega zaznavanja, zmanjšanje mišične napetosti v zgornjih okončinah ter povečano Rezultati motivacijo za izvajanje terapij. Pri enem udeležencu pa so se Rezultati so pokazali izboljšanje funkcije rok tako pri pojavili negativni stranski učinki, in sicer povečanje mišičnega eksperimentalni skupini kot pri kontrolni skupini. Kot je tonusa celotnega telesa. Vsa mnenja so bila podana subjektivno. prikazano na Sliki 1 je bil v eksperimentalni skupini dosežen večji napredek pri ocenjevanju ARAT in merjenju mišične moči z dinamometrom. Pri ocenjevanju Box&Blocks pa je večji 6 Zaključek napredek dosega kontrolna skupina. Pri ocenjevanju ARAT je bil zabeleženo izboljšanje v eksperimentalni skupini za 9,53 % in v možganov in kako lahko na funkcijo roke vplivamo z V prispevku smo osvetlili, kaj se dogaja s funkcijo roke po okvari kontrolni skupini za 5,69 %, večji napredek je bil dosežen v usmerjenimi terapevtskimi metodami. Ob spontanem okrevanju eksperimentalni skupini. Rezultati ocenjevanja Box&Blocks so pri rehabilitaciji se zanašamo na plastičnost možganov, ki jo pokazali večji napredek pri kontrolni skupini in sicer za 2,34 %. spodbujamo na pravi način, in sicer z motoričnim treningom in delovno-terapevtskimi obravnavami ali pa s sodobno tehnologijo (uporaba nevromodulacijskih tehnik kortikalnega draženja, robotika, navidezna resničnost, principi igric). Z razvojem tehnologije in razumevanjem mehanizmov delovanja živčevja upamo, da bomo tudi po možganskih okvarah dosegli čim boljšo funkcijo, zlasti funkcijo zgornjega uda, dokler ne bo medicina toliko napredovala, da bomo znali nadomestiti tudi izgubljene nevrone in njihove povezave oz. učinkovito preprečiti neugodne dogodke v našem živčevju. Slika 1: Napredek pri ponovno izvedenih ocenjevanjih po Pomembno dejstvo pri izvajanju obravnav s pomočjo robotskih končanih intenzivnih terapijah in primerjava med naprav je vidik bolnikove varnosti. Pri robotski napravi z eksperimentalno in kontrolno skupino glede na metodo možnostjo nastavitve moči in intenzivnosti izvajanja pasivnih vaj ocenjevanja. preprečimo možnost nastanka poškodb v primeru povišanega mišičnega tonusa ali mišičnega krča. Za varno izvajanje Mišična moč v zgornjih udih se je po končanih terapijah v obravnav je potrebno poznavanje delovanje robotske naprave in eksperimentalni skupini povečala za 7,60 %, v kontrolni za 4,60 ustrezno usposobljeni strokovni delavec, v našem primeru %. V primerjavi med eksperimentalno in kontrolno skupino je bil delovni terapevti. Za zagotavljanje varnosti je med izvajanjem pri končnih ocenjevanjih mišične moči večji napredek za 3,00 % obravnav potreben stalni nadzor delovnega terapevta. v eksperimentalni skupini. Rezultati merjenja mišične moči so prikazani na Sliki 2. Ugotavljamo, da so bili pri posameznih Po zaključeni raziskavi smo ugotovili, da robotska naprava za O/M končni rezultati ocenjevanj slabši kot pri prvem rehabilitacijo zgornjega uda vpliva le na posamezne sklepe ocenjevanju, zaradi odstopanj na področju procesnih spretnosti zgornjega uda, medtem ko pri klasičnih delovno-terapevtskih (pozornost, koncentracija, dnevno razpoloženje), kar je pristopih v obravnavi zajamemo celotno področje zgornjega uda povezano z njihovo poškodbo možganov, ki vpliva na utrujanje ter posturalno kontrolo trupa. Po izvedenih terapijah z uporabo ter procesne sposobnosti. robotske naprave je bil zaznan večji napredek na področju fine 9 motorike zgornjih udov, po izvedenih terapijah s klasičnimi [7] Dalamagkas K, Tsintou M, Rathi Y, O‘Donnell LJ, Pasternak O, Gong X pristopi delovne terapije pa na področju grobe motorike. et al. Individual variations of the human corticospinal tract and its hand- related motor fibers using diffusion MRI tractography. Brain Imaging Behav 2020; 14(3): 696–714. Na podlagi te raziskave ugotavljamo smiselnost kombinacije [8] Kamper DG, Fischer HC, Cruz EG, Rymer WZ. Weakness is the primary obeh pristopov, kjer zajamemo celostno področje funkcije contributor to finger impairment in chronic stroke. Arch Phys Med Rehabil 2006; 87: 1262 zgornjih udov. [9] Zorowitz RD, Gillard PJ, Brainin M. Poststroke spasticity: sequelae and burden on stroke survivors and caregivers. Neurology 2013; 80: S45–52. [10] Li S. Spasticity, motor recovery, and neural plasticity after stroke. Front Neurol 2017; 8: 120. Literatura [11] Classen J, Liepert J, Wise SP, Hallett M, Cohen LG. Rapid plasticity of human cortical movement representation induced by practice. J [1] Jorgensen HS, Nakayama H, Raaschou HO, Vive -Larsen J, Stoier M, Neurophysiol 1998; 79: 1117–23. Olsen TS. Outcome and time course of recovery in stroke. Part II: time [12] Cinnera AM, Bonnì S, D’Acunto A. Cortico-cortical stimulation and course of recovery the Copenhagen stroke study. Arch Phys Med Rehabil robot-assisted therapy (CCS and RAT) for upper limb recovery after 1995; 76: 406–12. stroke: study protocol for a randomised controlled trial. Trials 2023; 24: [2] Hayward KS, Kramer SF, Thijs V, Ratcliffe J, Ward NS, Churilov L et al. 823 A systematic review protocol of timing, efficacy and cost effectiveness of [13] Giggins OM, Persson UM, Caulfield B. Biofeedback in rehabilitation. J upper limb therapy for motor recovery post-stroke. Syst Rev 2019; 8 (1): Neuroeng Rehabil 2013; 10: 60 187. [14] Morone G, Spitoni GF, De Bartolo D, Ghanbari Ghooshchy S, Di Iulio F [3] Rong D, Zhang M, Ma Q, Lu J, Li K. Corticospinal tract change during Paolucci S et al. Rehabilitative devices for a top-down approach. Expert motor recovery in patients with medulla infarct: a diffusion tensor imaging Rev Med Devices 2019; 16 (3): 187–95 study. Bio-med Res Int 2014; 2014: 524096 [15] Poli P, Morone G, Rosati G, Masiero S. Robotic technologies and [4] Bregant, T. in sod., (2024). Rehabilitacija roke pri otrocih in mladostnikih rehabilitation: new tools for stroke patients’ therapy. Biomed Res Int po možganski okvari. Slovenska pediatrija, 31, str. 180–187. 2013; 2013: 153872. doi.org/10.38031/slovpediatr-2024-4-02. [16] Hatem SM, Saussez G, Della Faille M, Prist V, Zhang X, Dispa D et al. [5] Bregant T, Derganc M, Neubauer D. Uporaba magnetnoresonančnega Rehabilitation of motor function after stroke: a multiple systematic review slikanja z difuzijskimi tenzorji v pediatriji. Zdrav Vestn 2012; 81: 533–42 focused on techniques to stimulate upper extremity recovery. Front [6] Yoo YJ, Kim JW, Kim JS, Hong BY, Lee KB, Lim SH. Corticospinal tract Hum Neurosci 2016; 10: 442. integrity and long-term hand function prognosis in patients with stroke. Front Neurol 2019; 10: 374 10 Staring, Guessing, and Imagining: Strategies in Visual Working Memory Benjamin Bušelič Nina Purg Suljič Andrej Jablanovec Grega Repovš Anka Slana Ozimič University of University of University of University of University of Ljubljana, Faculty of Ljubljana, Faculty Ljubljana, Faculty of Ljubljana, Faculty Ljubljana, Faculty of Arts, Department of of Arts, Department Arts, Department of of Arts, Department Arts, Department of Psychology, Slovenia of Psychology, Psychology, Slovenia of Psychology, Psychology, Slovenia benjamin.buselic@ff. Slovenia andrej.jablanovec@gm Slovenia Anka.SlanaOzimic@ff. uni-lj.si nina.purg@ff.uni- ail.com grega.repovs@ff.un uni-lj.si lj.si i-lj.si Abstract Although working memory (WM) capacity is often treated as a 1 Introduction stable limit, performance in WM tasks is not determined by The current understanding of working memory (WM) is often capacity alone. Emerging evidence suggests that it is also based on the multicomponent model of WM proposed by influenced by the strategies individuals adopt to meet task Baddeley and Hitch [1] who conceptualized WM as a system demands – a factor that remains insufficiently explored. This used for the short-term maintenance and manipulation of study investigated how strategy use in visual working memory information [2]. Previous research [3, 4] has highlighted the varies depending on the specific requirements of the task, namely importance of WM in everyday tasks, including language and the features and combinations of features of visual stimuli to be reading comprehension, problem solving, and learning. remembered. Forty-eight students completed a visual WM span Given this central role in cognition, researchers have been task in which they had to remember colors, shapes or both particularly interested in the capacity limits of WM. Initial efforts properties of visual stimuli. When both features had to be to estimate this capacity suggested that it is highly limited. Miller remembered, colors and shapes were either presented in separate [5] introduced the concept of the magical number 7  2, objects (both separate condition) or combined within the same describing individuals’ WM capacity as the ability to retain objects (both integrated condition). Following each task approximately 7  2 units of information. However, later studies, condition, participants reported how often they had used specific particularly in the domain of visual WM, have proposed even strategies by completing a strategy questionnaire. Results lower estimates. Cowan [6] estimated the capacity of visual WM showed that visually oriented strategies (e.g., focusing on visual to be closer to 3-4 items. features and imagery) were most common across all conditions. While these estimates help define the capacity limits of WM, task Significant task condition effects emerged for the staring and performance is not determined by capacity alone. Individuals guessing strategies, which were reported most often in the both often employ strategies that allow them to optimize how separate condition. Furthermore, active pattern search was information is encoded and maintained. Such strategies do not positively correlated with WM span in the colors condition, increase capacity per se but can improve task performance by while passive waiting was negatively correlated with WM span making more efficient use of available capacity. in the both separate condition. These findings highlight that Miller [5] described the phenomenon of chunking, a strategy in performance in WM tasks reflects not only capacity limits but which individuals combine separate units of information into also the strategies individuals adopt. larger, meaningful ones (e.g., instead of remembering numbers 2 and 3 separately, they are stored together as 23). Subsequent Keywords research identified other strategic approaches, as a means to Visual working memory, working memory strategies, task enhance performance [7]. condition, working memory span More recent studies have taken a more open-ended approach to investigating WM strategies, allowing participants to report strategies they had spontaneously used, rather than limiting them Permission to make digital or hard copies of part or all of this work for personal or to narrow predefined categories. For example, Oblak et. al. [8] classroom use is granted without fee provided that copies are not made or distributed used qualitative methods to explore individuals’ experiences for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must during a WM task, identifying a variety of strategies employed. be honored. For all other uses, contact the owner/author(s). Building on this work, Slana Ozimič et. al. [9] reported that the Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia © 2025 Copyright held by the owner/author(s). strategy use depends on specific task conditions. https://doi.org/10.70314/is.2025.cogni.11 However, previous research has either examined strategy use in a very open-ended manner – typically through interviews or free- 11 response formats – or has focused on a narrow set of predefined (color, shape, both integrated, both separate) as the independent strategies. What has been lacking is a structured yet variable, and the frequency of strategy use as the dependent comprehensive approach to quantitatively assess a broad range variable. In addition, Pearson’s correlation analyses were of strategies across task conditions. To address this gap – and performed to examine relationships between strategy use and building on previous literature (e.g., [8–10]) - we developed a WM span within each task condition. All statistical tests were structured questionnaire that included a broad set of strategies performed separately for each strategy, while FDR corrections relevant to visual WM tasks. Using this questionnaire, we were applied within task phases – encoding, maintenance and examined whether different task conditions encourage the use of recall – to reflect the grouping of strategies by phase. different strategies, and whether the spontaneous use of such strategies is related to individuals’ WM performance. 3 Results 2 Methods First we examined the internal consistency of the questionnaire, 2.1 which was excellent (Cronbach’s α = .88), with an average inter- Participants item correlation of .16, indicating that items were related but not The study included 48 students (38 female, 8 male, 2 other), aged redundant. We then examined the mean self-reported frequency between 18 and 27 years (M = 19.98 years, SD = 2.23 years). of strategy use across all task conditions. The three most None of the participants reported neurological diseases or frequently reported strategies during encoding were identifying conditions and all participants had normal or corrected-to-normal distinctive features, inspecting visual features, and representing. vision. During the maintenance phase, participants predominantly relied on afterimage, rehearsing a visual image, and impression, 2.2 Behavioral task and strategy questionnaire whereas in the recall phase, the most frequently endorsed A behavioral task was used to assess visual WM span. The task strategies were comparing with a visual image, hunch, and was presented on a Windows 11 computer using PsychoPy applying verbal descriptions. Strategies that were, on average, (v2023.1.1), and each participant completed two sessions lasting used in less than 20% of trials were excluded from further about 60 minutes. The task included four conditions (two analyses, as their low overall frequency suggested limited conditions per session), the order of which was pseudo- relevance for interpreting task performance (Figure 1). randomized across participants. In the colors condition, On the remaining strategies, we conducted one-way ANOVAs to participants had to memorize the colors of circles; in the test for differences in strategy frequency across task conditions. shapes condition, they had to memorize the shapes of black outlines. In After applying FDR correction, two strategies (staring and the both separate condition, they were presented with an equal guessing) showed significant task condition effects. For staring, number of colored circles and shape outlines and were asked to a significant effect of task condition was found, F(3, 183) = 5.99, remember both features. In the both integrated condition, each p = .018, η² = .09, indicating small-to-medium effect size. Staring presented object combined both features—a unique shape filled was reported most frequently in the both separate condition, with a unique color—and participants were instructed to followed by the shapes and both integrated conditions, and least remember both the shape and the color of each object. In each frequently in the colors condition. Post hoc comparisons using trial, participants were presented with objects defined by color Tukey's tests revealed that the significant effect of condition was and/or shape for 500 ms. After a 2 s delay interval, they selected primarily driven by higher reported use of the staring strategy in from the array of all possible colors and/or shapes those they both separate condition compared to the colors condition (mean remembered being shown, by clicking on them (up to the number diff. = 22.82 %, SE = 7.14 %, p < .001). originally presented). The number of stimuli increased until a For guessing, there was also a significant effect of task condition, stable WM span was obtained in each task condition. Throughout F(3, 183) = 5.28, p = .022, η² = .08, indicating small-to-medium the trial, participants continually repeated the syllables »ta-ma« effect size. Tukey's post hoc comparisons indicated that guessing to suppress verbal rehearsal. was reported significantly more often in both separate condition After each task condition, participants completed strategy compared to the colors condition (mean diff. = 19.29 %, SE = questionnaire, consisting of 37 items, each formulated as a 6.71 %, p = .001), and shape condition (mean diff. = –14.51 %, statement describing a possible strategy (e.g., SE = 6.71 %, p = .024). “While viewing the stimuli, I actively searched for a pattern in the presented items”). Lastly, correlation analyses were conducted to examine The items were grouped into three phases of working memory relationship between the self-reported frequency of each strategy (encoding, maintenance, and recall) and included visual, spatial, use and WM span within each task condition. After FDR verbal, motor, auditory, long-term memory, and transmodal correction only two correlations remained statistically significant. strategies. Participants reported, for each statement, the Use of establishing a pattern strategy positively correlated with estimated frequency of its use during the preceding condition, WM span in the colors condition (r = .45, p = .045), while use of expressed as a percentage. waiting strategy negatively correlated with WM span in the both separate condition (r = -.44, p = .016). 2.3 Data analysis„ Data were analyzed using R [11]. To assess the effect of task condition on the frequency of strategy use, one-way ANOVAs were conducted separately for each strategy, with task condition 12 Figure 1: Average self-reported frequency (%) of strategy use across four task conditions Note. Error bars represent ±1 SE. Black vertical dotted lines represent 20% cut-off. Strategies below red horizontal lines were excluded from further analyses; * Strategies with statistically significant one-way ANOVAs after FDR correction. suggests that participants were more likely to rely on passive or less effortful strategies when task demands increased – 4 Discussion particularly when multiple visual features had to be encoded The aim of this study was to examine strategies individuals use simultaneously in separate objects. Similar findings have been under different visual WM task conditions, and whether the use reported in research showing that individuals tend to adopt less of these strategies is related to WM performance. Using a newly demanding strategies as task complexity and cognitive load developed, literature-based strategy questionnaire, the findings increases [14]. show that the use of strategies during WM tasks differs across Finally, our analysis showed that the use of two specific task conditions, consistent with previous findings [9]. strategies was significantly related to WM span. In the colors The most commonly reported strategies across all task conditions condition, participants who more frequently established a were visually based, such as focusing on the visual features of pattern, showed larger WM span, suggesting that combining the stimuli (identifying distinctive features), relying on colors into meaningful patterns supported memory performance. afterimage, or mentally comparing the current image with the In contrast, in the both separate condition, greater reliance on one stored in WM (comparing with a visual image). The waiting – waiting for the prompt to provide the answer – was predominance of visual strategies is consistent with the nature of associated with lower WM span. This indicates that disengaging the task, which required remembering visual properties – colors from active retrieval processes hindered performance in more and shapes – and thus naturally engages visual encoding and complex tasks. maintenance mechanisms [12]. In contrast participants rarely The present findings demonstrate that using a structured used motor strategies (e.g., motor planning, rehearsing motor questionnaire allowed us to identify specific links between WM plans), likely because such strategies are more effective in tasks strategy use and performance - something that was not captured involving spatial or movement-related information [13]. in our previous study [9], which relied on an open-ended Significant differences between task conditions emerged for the interview approach. staring strategy, which reflects a passive approach where remembering the stimuli, 5 Conclusion and the participants simply looked at the screen with the hope of guessing strategy, characterized by providing a response without confidence or Taken together these findings suggest that WM strategies play an clear memory of the stimuli. Both strategies showed a similar important role in the dynamic processes underlying WM. The pattern of use across task conditions: they were reported most complexity of these processes cannot be captured by WM span frequently in the both separate condition, followed by the shapes alone. While WM span provides useful estimate of capacity, it and both integrated conditions, and least frequently in the colors does not account for the individual differences in strategy use condition. Post hoc analyses indicated that the difference in that may influence task performance. Beyond the laboratory, staring was driven by higher reported use in the both separate such strategies are likely engaged in everyday contexts, for compared to the colors condition, while for guessing, significant example, when navigating environments, remembering differences were found between the both separate condition and instructions, or interpreting visual information in educational or the shapes condition and the colors conditions. This pattern occupational settings. Future research in this field should further 13 examine variability in the deployment of strategies, including [7] Morrison, A.B. and Chein, J.M. 2011. Does working memory how such strategies manifest on a neural level. training work? The promise and challenges of enhancing cognition by training working memory. Psychonomic Bulletin & Review . 18, 1 (Feb. 2011), 46–60. https://doi.org/10.3758/s13423-010-0034-0. Acknowledgements [8] Oblak, A., Slana Ozimič, A., Repovš, G. and Kordeš, U. 2022. This work was supported by the Slovenian Research and What Individuals Experience During Visuo-Spatial Working Memory Task Performance: An Exploratory Phenomenological Innovation Agency (Z5-50177 to N.P.S., J7-5553, J3-9264 and Study. Frontiers in Psychology . 13, (May 2022), 811712. P3-0338 to G.R.). https://doi.org/10.3389/fpsyg.2022.811712. [9] Slana Ozimič, A., Oblak, A., Kordeš, U., Purg, N., Bon, J. and References Repovš, G. 2023. The Diversity of Strategies Used in Working Memory for Colors, Orientations, and Positions: A Quantitative Approach to a First‐Person Inquiry. Cognitive Science. 47, 8 (Aug. [1] Baddeley, A.D. and Hitch, G. 1974. Working Memory. Psychology 2023), e13333. https://doi.org/10.1111/cogs.13333. of Learning and Motivation. Elsevier. 47–89. [10] Gonthier, C. 2021. Charting the Diversity of Strategic Processes in [2] Baddeley, A. 2012. Working Memory: Theories, Models, and Visuospatial Short-Term Memory. Perspectives on Psychological Controversies. Annual Review of Psychology. 63, 1 (Jan. 2012), 1– Science. 16, 2 (Mar. 2021), 294–318. 29. https://doi.org/10.1146/annurev-psych-120710-100422. https://doi.org/10.1177/1745691620950697. [3] Takeuchi, H., Taki, Y. and Kawashima, R. 2010. Effects of [11] R Core Team 2024. R: A Language and Environment for Statistical Working Memory Training on Cognitive Functions and Neural Computing. R Foundation for Statistical Computing. Systems. Reviews in the Neurosciences. 21, 6 (Jan. 2010). [12] Van Ede, F. 2020. Visual working memory and action: Functional https://doi.org/10.1515/REVNEURO.2010.21.6.427. links and bi-directional influences. Visual Cognition. 28, 5–8 (Sept. [4] Unsworth, N., Redick, T.S., Heitz, R.P., Broadway, J.M. and 2020), 401–413. https://doi.org/10.1080/13506285.2020.1759744. Engle, R.W. 2009. Complex working memory span tasks and [13] Purg Suljič, N., Kraljič, A., Rahmati, M., Cho, Y.T., Slana Ozimič, higher-order cognition: A latent-variable analysis of the A., Murray, J.D., Anticevic, A. and Repovš, G. 2024. Individual relationship between processing and storage. Memory. 17, 6 (Aug. differences in spatial working memory strategies differentially 2009), 635–654. https://doi.org/10.1080/09658210902998047. reflected in the engagement of control and default brain networks. [5] Miller, G.A. 1956. The magical number seven, plus or minus two: Cerebral Cortex. 34, 8 (Aug. 2024), bhae350. Some limits on our capacity for processing information. https://doi.org/10.1093/cercor/bhae350. Psychological Review. 63, 2 (Mar. 1956), 81–97. [14] Tavares, W., Ginsburg, S. and Eva, K.W. 2016. Selecting and https://doi.org/10.1037/h0043158. Simplifying: Rater Performance and Behavior When Considering [6] Cowan, N. 2010. The Magical Mystery Four: How Is Working Multiple Competencies. Teaching and Learning in Medicine. 28, 1 Memory Capacity Limited, and Why? Current Directions in (Jan. 2016), 41–51. Psychological Science. 19, 1 (Feb. 2010), 51–57. https://doi.org/10.1080/10401334.2015.1107489. https://doi.org/10.1177/0963721409359277. 14 Machine bias: new experiments with COMPAS data * Ana Farič † Ivan Bratko Fačulty of Edučation Fačulty of Computer and Information Sčienče University of Ljubljana University of Ljubljana Slovenia Slovenia af27987@student.uni-lj.si bratko@fri.uni-lj.si Abstract definitional diffičulties. Some studies člaim COMPAS is analysis of the COMPAS rečidivism predičtion system. While inčonsistenčy raises a deeper question: to what extent do some studies člaim COMPAS is račially biased and others observed disparities reflečt the various models versus the argue the opposite, our repličation and extension of prior underlying data and sočial čontext? This paper revisits the debate on mačhine bias through an fairness metrič is applied. Beyond the tečhničal debate, this račially biased, while others disagree, depending on whičh work show that ačross diverse methods aččuračy čonsistently čonverges at around 66-67%. Moreover, error distributions follow a stable pattern: higher false positive 2 Understanding the concept of machine rates for blačk defendants and higher false negative rates for bias white defendants. We argue that this čonvergenče reflečts inherent diffičulty of this predičtion problem and probably In čomputer sčienče, dozens of fairness metričs and bias yet unexplained asymmetries in this domain. Our findings definitions exist, often in čontradičtion with one another [16, suggest that debates on fairness should move beyond model 17, 18, 19]. Philosophers, legal sčholars, and sočial sčientists čhoiče to address systemič disparities that shape observed have long debated the meaning of bias, and čomputer outčomes. sčientists fače the additional čhallenge of operationalizing these abstračt čončepts into measurable čriteria [12]. Keywords Despite numerous attempts to resolve this ambiguity, no čonsensus has emerged. Even mathematičal definitions, artifičial intelligenče, mačhine bias, fairness, COMPAS system while prečise, often lačk čončrete examples that would make them appličable in real-world dečision-making čontexts. 1 Introduction Bias in mačhine learning (ML) is a multifačeted čončept, Calls for unbiased AI systems are inčreasingly more čommon enčompassing both tečhničal and sočial dimensions. in regulation debates. For example, in September 2024, one Researčhers identify three broad types of bias: of the GPAI (Global Partnership on Artificial Intelligence) 1. Indučtive/learning bias: in supervised learning, an working groups released a report [13] rečommending that AI algorithm seeks a funčtion that predičts outčomes from data. system providers be held liable for disčriminatory impačts Many funčtions may fit the training data, but most fail to and required to čompensate individuals harmed by generalize well. Preferential bias is needed to selečt čertain algorithmič bias. Although the group attempted to člarify and funčtions over others, guiding learning toward useful better define the notion of bias in the revised report [14], generalizations. As sučh, bias is a nečessary čomponent that released in November 2024, it ultimately offered no pračtičal enables learning [16]. metričs or other čriteria to determine with čonfidenče 2. Historičal bias: reflečts real-world prejudičes embedded in whether a system is biased or not. This highlights a broader the data. As sučh, even perfečtly measured data may produče čhallenge: while the demand for unbiased AI systems is biased outčomes if the underlying reality is disčriminatory [1, growing, even well-intentioned poličymakers struggle to 16, 19]. translate abstračt čončepts of fairness into ačtionable, 3. Biases that arise during data generation: spečifičation, measurable čriteria. The COMPAS rečidivism predičtion measurement/observation, sampling/population, annotator system exemplifies these bias etč. [5, 16, 24]. In pračtiče, bias is most often disčussed in terms of its sočial čonsequenčes, sučh as when models člassify individuals ∗Article Title Footnote needs to be captured as Title Note differently based on protečted attributes like rače or gender †Author Footnote to be captured as Author Note [16]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or Computer sčientists have formalized bias through various distributed for profit or commercial advantage and that copies bear this notice fairness metričs, inčluding: and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). 1. Demographič parity: equal positive predičtion rates ačross Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia groups [16]; http://doi.org/10.70314/is.2025.čogni.13 2. Equalized odds: equal false positive (FPR) and false © 2025 Copyright held by the owner/author(s). negative rates (FNR) ačross groups [15]; 15 3. Predičtive parity: equal predičtion aččuračy ačross groups rečidivism ačross račial groups and different fairness metričs. [23]. Cončeptual and formal tensions sučh as these, help explain These metričs are mutually inčompatible with the exčeption why analyses of COMPAS produče čonfličting assessments of of čertain very trivial čonditions [17]. the same system [9]. For a more čomprehensive survey of existing bias definitions and their limitations, we refer the reader to our previous work [9]. 4 Our analysis We notičed an intriguing pattern in the previously 3 mentioned studies: models of varying čomplexity produče COMPAS čomparable aččuračy, and FNR and FPR. While these results A good example that demonstrates the problem of a lačk of a have been reported independently, they have not been unified definition is COMPAS (Correctional Offender systematičally čompared within the same analytičal Management Profiling for Alternative Sanctions). It is a model framework, nor have their broader impličations been still used by US čourts to assess the likelihood that a thoroughly examined. In this paper, we aim to repličate prior defendant will reoffend within two years of the evaluation findings to čonfirm their robustness, and to extend the date if released [2, 20]. The assessment is based on 137 disčussion by investigating why sučh čonvergenče oččurs attributes about the defendant, inčluding personal ačross different methods. information and čriminal history. Rače is not čonsidered in the evaluation [8]. The model assists judges in making 4.1 Method dečisions about bail and sentenčing, partičularly in We used the publičly available COMPAS dataset released by determining whether defendants awaiting trial are too ProPubliča [2] on GitHub dangerous to be released [6]. (https://github.čom/propubliča/čompas-analysis). To The model assigns defendants a sčore between 1 and 10, same version of the dataset as used by [8]. The dataset indičating how likely they are to reoffend. These sčores have ensure čomparability with previous studies, we selečted the a signifičant impačt on the lives of the defendants. Those čriminal history, and COMPAS risk sčores, inčluding čontains 53 attributes, inčluding demographič information, rated as medium- or high-risk (sčores 5-10) are more often held in detention until trial, while low-risk defendants defendants from Broward County, Florida. Following the protečted attributes sučh as rače and sex, for 7214 (sčores 1-4) are more frequently released [2, 6]. previous researčhers, we filtered the dataset to inčlude only 3.1 blačk and white defendants, resulting in final 6150 Previous research on COMPAS individuals. In 2016, ProPubliča [2] sparked an intense debate by člaiming that COMPAS was biased against blačk defendants. We trained the following models using the Orange data Over the following years, researčhers reačhed čontradičtory mining platform, applying an 80%/20% training/testing čončlusions, highlighting the diffičulty of assessing bias in split that was repeated 10 times to čompute true positives this system. Northpointe, the developer of COMPAS, reječted (TP), true negatives (TN), false positives (FP), and false ProPubliča’s člaims, arguing that ProPubliča’s analysis was negatives (FN), both overall and separately for blačk and methodologičally flawed, and that ProPubliča should have white defendants: used standard fairness measures sučh as AUC-ROC, under 1. Logistič regression: simple linear člassifier, trained with whičh COMPAS showed no račial bias [7]. Similarly, Flores et either 6 (sex, age, prior črimes, črime degree, number of al. [10] argued that there is no signifičant differenče in juvenile misdemeanors and felonies) or 2 attributes (age and predičtive aččuračy between white and blačk defendants. In priors). We exčluded the črime desčription attribute (used by AI literature, the negative assessments of COMPAS prevail. [8]) bečause it čontains over 400 different values, whičh they redučed to 63 for human judgement purposes. As we čould Subsequent studies further čompličated the debate. Dressel not reproduče this exačt transformation, we omitted it to test and Farid [8] showed that COMPAS performs no better than whether the remaining attributes alone suffiče. laypeople in predičting rečidivism, and that a simple linear 2. Dečision tree: was čonstručted to approximate Rudin’s člassifier with only two or seven attributes produčes [22] rule-based model. Two attributes (age čategory and aččuračy results čomparable to COMPAS’s 137-attribute priors) were used and its depth was limited to 5. system. Rudin [22] reačhed a similar čončlusion with a three- rule interpretable model based on just two attributes. These Models were evaluated using: findings questioned the added value of čomplex risk 1. Aččuračy: proportion of čorrečt predičtions on test set. assessment tools in this domain. 2. FPR: proportion of non-rečidivists inčorrečtly predičted to reoffend. Other researčh emphasizes inherent trade-offs in fairness 3. FNR: proportion of rečidivists inčorrečtly predičted not to metričs. Corbett-Davies et al. [6] and Zafar et al. [27] all reoffend. highlight the impossibility of simultaneously satisfying For each model, TP, TN, FP, and FN were first recorded for čompeting fairness definitions, given differing base rates of each of the 10 repetitions. These counts were then pooled 16 across repetitions, and accuracy, FPR and FNR were FNR 40.3% 42.1% 47.9% 47.3% 46.1% calculated from the pooled counts for each race group. (white) Finally, metrics were averaged across all repetitions to produce values reported in tables 1 and 2. Table 2: Columns A-D summarize predictive The results from our models were direčtly čompared to performance of our models. A shows LR trained on 6 attributes, excluding race, B shows LR trained on the reported metričs from [2, 8, 22], allowing us to čompare same 6 attributes with race included; C shows LR trained predičtive performanče ačross models. on 2 attributes, D shows a decision tree (DT) trained on 4.2 Results 2 attributes. CA, FPR and FNR are reported overall and separately for black and white defendants. The results from previous researčhers are summarized in table 1. Our results are summarized in table 2. A: LR-6 B: LR-6 C: LR-2 D: DT-2 Ačross all methods (table 2), overall aččuračy čonverged (no (rače) around 66-67%, čonsistent with the performanče reported rače) by ProPubliča [2] and Dressel and Farid [8]. While our exačt CA 67.2% 67.1% 66.5% 66.8% error rates differ somewhat from those reported previously, (over- the same pattern in error distribution was observed; blačk all) defendants exhibited higher FPR, whereas white defendants CA 66.9% 67.2% 66.7% 66.8% exhibited higher FNR. Finally, čompared to [8], who (blačk) inčorporated a redučed version of the CA 67.1% 66.4% 66.1% 67.7% crime description attribute, our results suggest that exčluding this feature does (white) not substantially čhange performanče. FPR 29.0% 31.1% 31.1% 35.2% (blačk) Table FPR 15.1% 15.2% 16.5% 20.1% 1: Columns A-E summarize predictive performance across different models and conditions (white) from previous researchers. Column A reports human FNR 36.7% 34.5% 35.4% 31.3% judgements without access to information about race; B (blačk) reports human judgements with race, C shows COMPAS FNR 61.7% 61.4% 60.6% 51.0% predictions as reported by ProPublica, D and E show (white) logistic regression (LR) models trained on 7 or 2 attributes respectively. Accuracy (CA), FPR and FNR are reported overall and separately for two races. A: hu- Figures 1-3 present an example of a dečision tree trained on B: hu- C: D: LR-7 E: LR- man two attributes (age čategory: 1. < 25, 2. 25-45, 3. > 45) and man COM- 2 (no number of prior offenčes, with the tree depth limited to 5. (rače) PAS rače) The tree splits defendants into subgroups, with the leaves CA representing predičted rečidivism risk (0=predičted not to 67.0% 66.5% 65.2% 66.6% 66.8% (over- reoffend, 1=predičted to reoffend) and the proportion of all) majority člass. CA To improve readability, the tree is divided into 68.2% 66.2% 64.9% 66.7% 66.7% (blačk) three figures: figure 1 shows the root node and the initial CA split by the number of priors, figure 2 shows the left subtree 67.6% 67.6% 65.7% 66.0% 66.4% (white) (defendants with less or equal 2 priors), and figure 3 shows FPR the right subtree (defendants with > 2 priors). Among these, 37.1% 40.0% 40.4% 42.9% 45.6% (blačk) defendants older than 45 are further divided by priors. The FPR utmost right leaf in figure 3 predičts that defendants with 27.2% 26.2% 25.4% 25.3% 25.3% (white) more than 20 priors will reoffend (1), with probability 76.9%. FNR 29.2% 30.1% 30.9% 24.2% 21.6% (blačk) 25 – 45 or > 45 25 – 45 or < 25 Figure 1: Root node of the decision tree and the initial split into left and right subtrees. 17 < 25 25 – 45 or > 45 > 45 25 – 45 > 45 Figure 2: Left subtree of the decision tree 25 – 45 or < 25 > 45 < 25 Figure 3: Right subtree of the decision tree To further illustrate how the dečision tree makes predičtions, figures 4 and 5 show the distribution of defendants by rače (the first two čolumns in both figures čorrespond to blačk defendants, the right ones to white defendants) within two example leaves. The bar čharts make it člear that, although the predičtion is the same within eačh leaf (1 = will reoffend, shown in red; the blue čolor indičates the number of defendants who did not reoffend), the underlying račial čomposition of these subgroups čan vary substantially. Additionally, the figures illustrate how estimated predičtion errors and člass balanče differ ačross leaves. Figure 4: Distribution of defendants by race in the leaf (right-most leaf in figure 3) corresponding to the path: > 2 priors --> age > 45 --> > 20 priors. All defendants in this subgroup are predicted to reoffend, with 76.9% probability. 18 56% of people imprisoned nationwide for a drug offenče are blačk or Latino, and 48% of people serving life sentenčes are blačk. Another report [25] emphasizes that 56.4% of those serving life without parole sentenčes are blačk. Additionally, Williamsons’ framework [26] emphasizes that the higher črime rates observed among blačk individuals are not indičative of inherent črime tendenčies, but rather reflečt systemič ečonomič disparities whičh are often the result of historičal and ongoing poličies that have marginalized blačk čommunities, limiting their aččess to resourčes and opportunities. Therefore, the čonvergenče of predičtive models like COMPAS with other simple ML models and even lay people judgements may not solely be a tečhničal issue but also a reflečtion of deeper sočietal inequalities. While our study čonfirms agreement ačross models and highlights the importanče of stručtural fačtors, several avenues for further researčh remain. At a methodologičal level, further work čould explore different versions of the ProPubliča dataset, test additional feature čombinations, and Figure 5: Distribution of defendants by race in the leaf evaluate a wider range of ML models to assess the robustness (left-most leaf in figure 3) corresponding to the path: > 2 of these patterns. At a broader level, additional researčh priors --> age < 25 --> 3 priors. All defendants in this should examine the underlying systemič fačtors that drive subgroup are predicted to reoffend with probability disparities in rečidivism predičtions, thus čontextualizing 65.2%. algorithmič predičtions within real-world sočial dynamičs and inform poličy disčussions on the responsible use of 4.3 predičtive models in the justiče system. Discussion Our findings čonfirm and extend those of [2, 8, 22]. Ačross models of varying nature and čomplexity (blačk box, logistič 5 Conclusion regression, interpretable rule-based, and even human Our study revisits the COMPAS recidivism prediction debate judgement) predičtive aččuračy čonsistently hovers around by replicating and extending previous findings and 66-67%. Moreover, we repličated the čharačteristič error discussion why different methods (ranging from black-boxes distribution pattern; higher FPR for blačk defendants, and to simple linear predictors, interpretable rule-based models, higher FNR for white defendants. We extend the mentioned and human judgements) consistently converge on similar prior researčh by demonstrating this čonvergenče using predictive performance and error patterns. Across all dečision trees as an approximation of Rudin’s [22] methods, accuracy hovered around 66-67%, with interpretable rules. characteristic error distributions showing higher FPR for While [22] emphasizes the use of inherently black defendants and higher FNR for white defendants. interpretable models, and [8] question the overall utility of disčussion toward the underlying reasons why all these proper interpretation and broader impličations have methods yield similar results, in partičular similar error rečeived less attention. We emphasize that COMPAS may be algorithmič rečidivism predičtion, our work shifts the While prior researčh has dočumented this čonvergenče, its patterns. wrongfully vilified; while skeptičism regarding algorithmič The čonvergenče of predičtions ačross methods risk assessment is warranted, using ML systems to inform suggests that the limitations may lie less in model čhoiče and dečisions should not be dismissed outright, as they hold more in the data and domain itself, whičh prior analyses potential to support informed dečision-making if used often overlook. responsibly. Beyond dataset quality, stručtural fačtors sučh as račial More importantly, we argue that the debate should shift disparities in arrests and sentenčing likely drive the toward understanding why sučh čonvergenče oččurs. Our čonsistent error patterns observed ačross all models. As [11] findings suggest that it reflečts domain-spečifič and reports, the lifetime likelihood of imprisonment for blačk stručtural fačtors, inčluding disparities in arrest, sentenčing, men was one in three for those born in 1981, and one in five and systemič sočio-ečonomič inequalities that induče for those born in 2001. A report from 2018 [21] emphasizes observed rečidivism rates. By examining these elements, that the imprisonment rate for blačk adults is 5.9 the rate for alongside limitations in čommonly used datasets, we čan white adults – and even higher in some states. These better čontextualize predičtive performanče and the disparities exist for both least and more serious offenčes; persistenče of račial disparities. 19 References [15] Hardt, M., Price, E. & Srebro, N. 2016. Equality of Opportunity in Supervised Learning. ArXiv:1610.02413. [1] Alelyani, S. 2021. Detection and Evaluation of Machine Bias. Applied DOI:https://doi.org/10.48550/arXiv.1610.02413. Sciences 11, 4. DOI:https://doi.org/10.3390/app11146271. [16] Hellstrom, T., Dignum, V. & Bensch, S. 2020. Bias in Machine Learning [2] Angwin, J., Kirchner, L., Larson, J. & Mattu, S. 2016. Machine Bias: – What is it Good for? ArXiv:2004.006686. There’s software used ačross the čountry to predičt future čriminals. DOI:https://doi.org/10.48550/arXiv.2004.00686. And it’s biased against blačks. ProPublica. [17] Kleinberg, J., Mullainathan, S. & Raghavan, M. 2021. Inherent Trade - [3] Barenstein, M. 2019. ProPubliča’s COMPAS Data Revisited. Offs in the Fair Determination of Risk Scores. ArXiv:1609.05807. ArXiv:1906.04711. DOI: https://doi.org/10.48550/arXiv.1906.04711. DOI:https://doi.org/10.48550/arXiv.1609.05807. [4] Beck, A. J. 2021. Race and Ethnicity of Violent Crime Offenders and [18] Mehrabi, A., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. 2021. Arrestees, 2018. U.S. Department of Justice, Statistical Brief. A Survey on Bias and Fairness in Machine Learning. ACM Computing Surveys. DOI:https://doi.org/10.1145/3457607. [5] Chakraborty, J., Majumder, S. & Menzies, T. 2021. Bias in Machine Learning Software: Why? How? What to do? ESEC/SIGSOFT FSE. [19] Ntoutsi, E., Fafalios, P., Gadiraju, U., Iosifidis, V., Nejdl, W., Vidal, M. E., DOI:https://doi.org/10.1145/3468264. 3468537. … Staab, S. 2020. Bias in data-driven artificial intelligence systems – An introductory survey. WIREs Data Mining and Knowledge Discover y [6] Corbett-Davies, S., Pierson, E., Feller, A. & Goel, S. 2016. A computer 10, 3. DOI:https://doi.org/10.1002/widm.1356. program used for bail and sentencing decisions was labeled agains t blačks. It’s ačtually not that člear. The Washington Post. [20] Porębski, A. 2023. Mačhine learning and law. Research Handbook on Law and Technology. Cheltenham, UK: Edward Elgar Publishing. [7] Dieterich, W., Mendoza, S. & Brennan, T. 2016. COMPAS Risk Scales : Demonstrating Accuracy Equity and Predictive Parity. [21] Report to the United Nations on Racial Disparities in the U.S. Crimina l Justice System. 19.4.2018. The Sentencing Project. [8] Dressel, J. & Farid, H. 2018. The accuracy, fairness, and limits of https://www.sentencingpro ject.or g/r eports /report-to-the-united- predicting recidivism. Science Advances 4, 1. nations-on-racial-disparities-in-the-u-s-criminal-justice-system/. DOI:https://doi.org/10.1126/sci adv.aao 5580. [22] Rudin, C. 2018. Stop Explaining Black Box Machine Learning Models [9] Farič, A. & Bratko, I. 2024. Mačhine Bias: A Survey of Issues. for High Stakes Decisions and Use Interpretable Models Instead. Informatica 48, 2. DOI: https://doi.org/10.31449/inf.v48i2.5971. ArXiv:1811.10154. DOI:https://doi.org/10.48550/arXiv.1811.10154. [10] Flores, A. W., Lowenkamp, C. T. & Bechtel, K. 2016. False Positives, [23] Saravanakumar, K. K. 2021. The Impossibility Theorem of Machine False Negatives, and False Analyses: A Rejoinder To “Mačhine Bias: Fairness: A Causal machine learning algorithms interaction. There’s software used ačross the čountry to predičt future čriminals. ArXiv:2007.06024. DOI:https://doi.org/10.48550/arXiv.2007.06024. And it’s biased against blačks. Federal Probation 80, 2. [24] Sun, O., Nasraoui, O. & Shafto, P. 2020. Evolution and impact of bias in [11] Ghandnoosh, N. 7.12.2023. One in Five: Racial Disparity in human and machine learning algorithms interaction. PLOS ONE 15, 8. Imprisonment – Causes and Remedies. The Sentencing Project. DOI:https://doi.org/10.1371/journal.pone.0235502. https://www.sentencingpro ject.or g/r eports /one-in-five-r acial- disparity-in-imprisonment-causes-and-remedies/. [25] Walsh, A. 15.8.2016. The criminal justice system is riddled with racial disparities. Prison Policy Initiative. [12] Goel, N., Yaghini, M. & Faltings, B. 2018. Non-Discriminatory Machine https://www.prisonpolicy.org/blog/2016/08/15/cjr ace/. Learning through Convex Fairness Criteria. The 23nd AAAI Conference on Artificial Intelligence 32, 1. [26] Williamson Kramer, C. 13.2.2024. Systemic Racism in Crime: Do Blacks Commit More Crimes Than Whites? Liberty Matters . [13] GPAI 2024. Towards Substantive Equality in Artificial Intelligenc e: https://oll.libertyfund.org/publications /lib erty-matters/2024-02- Transformative AI Policy for Gender Equality and Diversity. Report, 13-systemic-racism-in-crime-do-blacks-commit-more-crimes-than- September 2024, Global Partnership on AI. whites. [14] GPAI 2024. Towards Substantive Equality in Artificial Intelligenc e: [27] Zafar, M. B., Valera, I., Gomez Rogrigues, M. & Gummadi, K. P. 2017. Transformative AI Policy for Gender Equality and Diversity. Report, Fairness constraints: Mechanisms for fair classification. November 2024, Global Partnership on AI. ArXiv:1507.05259. DOI:https://doi.org/10.48550/arXiv.1507.05259. 20 Primerjava lastnosti človeške kognicije in umetne inteligence Comparison of the characteristics of human cognition and artificial Intelligence Monika Jamšek† Rok Smodiš Marko Jordan Faculty of Public Administration, Faculty of Education, Odsek za inteligentne sisteme University of Ljubljana University of Ljubljana Jozef Stefan Institute Gosarjeva ulica 5 Kardeljeva ploščad 16 1000 Ljubljana, Slovenija 1000 Ljubljana, Slovenia 1000 Ljubljana, Slovenia marko.jordan@ijs.com monika.jamsek@gmail.com rs68734@student.uni-lj.si Matjaž Gams Odsek za inteligentne sisteme Jozef Stefan Institute 1000 Ljubljana, Slovenija matjaz.gams@ijs.com Povzetek memory, learning, language, creativity, and emotions, along with Referat ponuja poglobljen teoretični okvir, ki primerja ključne their analogs in artificial systems. The focus is on the strengths kognitivne funkcije človeka in sistemov umetne inteligence (UI). and limitations of both perspectives, highlighting ethical Na podlagi najnovejših raziskav iz nevroznanosti, kognitivne dimensions and the risk of cognitive atrophy that may result from excessive use of AI. The core of the paper features a comparative psihologije, umetne inteligence in filozofije uma so predstavljeni procesi pozornosti, spomina, učenja, jezika, kreativnosti in table that visually displays the differences, strengths, and čustev ter njihova analogija v umetnih sistemih. V ospredje so weaknesses of humans and AI. Special attention is given to the concept of hybrid intelligence, which points toward a future of postavljene prednosti in omejitve obeh perspektiv, izpostavljene so etične razsežnosti ter nevarnost kognitivne atrofije, ki jo lahko collaboration between both forms of intelligence. The conclusion povzroči pretirana uporaba UI. Osrednji del referata predstavlja emphasizes that the future will not revolve around replacing pregledna tabela, ki vizualno prikaže razlike, prednosti in humans, but rather around a complementary partnership that slabosti človeka in UI. Posebej je obravnavan koncept hibridne opens up a new cognitive paradigm. inteligence, ki nakazuje prihodnost sodelovanja obeh oblik Keywords inteligence. Zaključek poudarja, da prihodnost ne bo temeljila na nadomeščanju človeka, temveč na komplementarnem human cognition, artificial intelligence, emotions, AI ethics, partnerstvu, ki odpira novo kognitivno paradigmo. hybrid intelligence, comparative analysis Ključne besede 1 Uvod človeška kognicija, umetna inteligenca, čustva, etika UI, hibridna inteligenca, primerjalna analiza Raziskovanje človeške kognicije, od zaznavanja in spomina do učenja, jezika, kreativnosti in čustev, je temeljno področje Abstract psihologije, nevroznanosti in kognitivne znanosti. Neisser [1] The paper offers an in-depth theoretical framework that kognicijo opredeli kot procese obdelave informacij za compares key cognitive functions of humans and artificial prilagodljivo vedenje, sodobni modeli pa poudarijo integracijo intelligence (AI) systems. Drawing on the latest research from izvršilnih funkcij, čustev in motivacije [2, 3]. Tononi [4] neuroscience, cognitive psychology, artificial intelligence, and obravnava zavest skozi pet informacijskih aksiomov in the philosophy of mind, it presents processes such as attention, teoremov. Vzporedno je umetna inteligenca, posebej veliki jezikovni ∗Primerjava lastnosti človeške kognicije in umetne inteligence modeli (LLM-ji), v zadnjih letih dosegla izjemen napredek: †Monika Jamšek, Rok Smodiš, Marko Jordan, Matjaž Gams sistematično izboljšujejo rezultate na standardiziranih nalogah, Permission to make digital or hard copies of part or all of this work for personal or obvladujejo kompleksne dialoge in v številnih neformalnih classroom use is granted without fee provided that copies are not made or distributed preizkusih v duhu Turingovega testa sogovorniki pogosto ne for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must prepoznajo, da komunicirajo s strojem (npr. [5]). Ta navzven be honored. For all other uses, contact the owner/author(s). vidna kompetentnost krepi vtis približevanja človeški ravni. Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia © 2025 Copyright held by the owner/author(s). Kljub temu pa je napredek pri “človeških” lastnostih, kot so https://doi.org/10.70314/is.2025.cogni.5 zavest, fenomenalno doživljanje, globoko semantično sidranje in http://doi.org/DOI_RECEIVED_AFTER_REVIEW 21 metakognitivna samoregulacija, bistveno skromnejši. igrajo frontalno-parietalne možganske mreže [7]. Nepogrešljiv Potencialna razlaga je, da sodobni modeli še niso razrešili del kognitivnega sistema je tudi spomin, ki ga delimo na problema mnogoterega znanja – v Gamsovem principu senzornega, kratkoročnega, delovnega in dolgoročnega. mnogoterega znanja: kako v enoten, konsistenten in Baddeley [8] je delovni spomin razčlenil na fonološko zanko, samoreferencialen model sveta povezati heterogene oblike vizuoprostorski blok in izvršilni mehanizem, medtem ko Cowan znanja (statistične vzorce, simbolne strukture, kavzalne relacije, [9] ugotavlja, da kratkoročni spomin praviloma ne preseže štirih epizodične in utelešene reprezentacije) ter nad njimi izvajati enot. Ti mehanizmi tvorijo osnovo za učenje in sklepanje, pri stabilno vsebinsko procesiranje, razlago pomena in namere. čemer človek pogosto uči iz malo podatkov ter kompozicionalno Dokler ta integracijski problem ostaja odprt, lahko UI dosega posplošuje na nove okoliščine. Ravenove matrice so pokazatelj vrhunske rezultate na nalogah, vendar ne kaže jasnega napredka sposobnosti abstraktnega mišljenja [10], medtem ko Stanovich proti lastnostim, ki jih povezujemo z zavestnim človeškim [11] opozarja, da tradicionalni testi inteligence ne zajamejo mišljenjem. racionalnosti, ključne za učinkovito odločanje. Članek sistematično obravnava, kako daleč ChatGPT (LLM) Posebno mesto v človeški kogniciji ima jezik, saj omogoča seže v smeri “kognitivnih” lastnosti, še posebej glede zmožnosti simbolno reprezentacijo, socialno koordinacijo in prenos prehoda Turingovega testa in vprašanja zavesti. Avtorja najprej kulturnega znanja med generacijami. Jezikovne sposobnosti so predstavita hierarhijo Turingovih testov – od klasičnega kratkega semantično zasidrane v utelešeni in socialni izkušnji, z jezikom besednega testa do ekspertno/adverzarnega, fizičnega, pa je tesno povezana tudi kreativnost, ki presega zgolj “totalnega” in “truly total” testa – ter argumentirata, da se divergentno mišljenje in je rezultat interakcije posameznika, ChatGPT lahko približa uspehu predvsem pri naivnih njegovega domenskega znanja ter širšega sociokulturnega okolja spraševalcih, medtem ko ga izkušeni/ekspertni spraševalci [12, 13]. Vendar pa na kognitivne procese močno vplivajo tudi zlahka razkrijejo. čustva in motivacija. Damasio [3] je pokazal, da brez čustvene Ta referat je v nekem smislu nadaljevanje članka [6], ki komponente človeška racionalnost oslabi, medtem ko analizira in podaja oceno “zavesti” po Tononijevi teoriji Immordino-Yang [14] poudarja, da so čustva neločljivo integrirane informacije (IIT) [4]. Po kratki predstavitvi IIT povezana z učenjem, odločanjem in socialnim vedenjem. aksiomov in teoremov je narejena primerjava GPT-jev in ljudi. Dodatno raven kompleksnosti predstavljata metakognicija in Zaključek je, da ChatGPT sicer presega starejše AI-sisteme po zavest, ki posamezniku omogočata samorefleksijo ter strateško jezikovni kompetenci in širini znanja, vendar občutno zaostaja za prilagajanje učenja, vključno s fenomenalnim doživljanjem, kar biološkimi sistemi v integraciji informacij, zato nima je bistveno za učinkovito regulacijo lastnih miselnih procesov. fenomenalne zavesti; gre za napredno orodje, ne za Nenazadnje pa ima ključno vlogo tudi socialna kognicija, ki “informacijsko živo bitje”. vključuje sposobnost razumevanja drugih preko t. i. teorije uma Metodološko članek združi pregled literature o Turingovem [15], kar je osnova za uspešno sodelovanje, empatijo in gradnjo testu in teorijah zavesti z avtorjevimi lastnimi številnimi družbenih odnosov. Ta človeška arhitektura mnogoterega znanja, ekspertno zasnovanimi dialognimi preizkusi, kjer se pokažejo od senzorno-motoričnih in epizodičnih do simbolnih in omejitve modela (npr. pomanjkanje stabilne semantične kavzalnih reprezentacij, bo v nadaljevanju služila za pojasnilo, sidranosti/intencionalnosti). Zaključek poudari praktično zakaj današnji LLM-ji kljub hitremu napredku na nalogah pri implikacijo: LLM-je je smiselno obravnavati kot zelo zmogljiva, lastnostih, kot sta zavest in metakognicija, napredujejo bistveno a ne-čuteča orodja, s potrebo po previdnosti pred počasneje [6]. antropomorfizacijo in zavedanju meja pri presoji pomena in namere. Hkrati pa je narejena analiza ogromnih napredkov AI na nekaterih področjih. Ključno je vprašanje, ali se je razvoj ustavil, 3 Kognitivne lastnosti umetne inteligence ali pa se nadaljuje z nezmanjšanim tempom. Kognitivne lastnosti obravnavamo kot funkcionalne analoge V tem prispevku primerjalno obravnavamo ključne človeških sposobnosti; pri UI so te uresničene prek drugačnih kognitivne funkcije človeka in njihove analoge v UI, pri čemer mehanizmov (statistične reprezentacije, optimizacija, vzorčno izrecno ločujemo zunanje zmogljivosti na nalogah od notranjih ujemanje) kot pri ljudeh, zato ločujemo zmogljivost reševanja mehanizmov in lastnosti. Predstavimo pregledni okvir, nalog od notranjih lastnosti (zavest, fenomenalno doživljanje, posodobljeno primerjalno tabelo ter razpravo o hibridni metakognitivna samoregulacija). inteligenci, kjer kombinacija človeških in umetnih zmožnosti Zaznavanje in pozornost. V računalniškem vidu so globoke ponuja praktično pot naprej – ob hkratnem opozorilu na etične mreže dosegle (in na nekaterih nalogah presegle) človeško raven razsežnosti in tveganje kognitivne atrofije zaradi pretirane rabe prepoznave [16]. V jezikovni obdelavi mehanizem porazdeljene UI. pozornosti omogoča učinkovito kontekstno sledenje in gradnjo hierarhičnih reprezentacij [17], vendar ostaja to algoritemska 2 Kognitivne funkcije človeka [18]. selekcija informacij, ne zavestno usmerjena pozornost z namero Človeška kognicija zajema širok spekter medsebojno povezanih Spomin. LLM-ji shranjujejo »znanje« v parametrih in procesov, ki skupaj omogočajo prilagodljivo vedenje in trenutnih kontekstih; zunanji pomnilniki (RAG ipd.) razširijo kompleksno obdelavo informacij. V tem razdelku sintetiziramo doseg, a ne tvorijo epizodičnega ali utelešenega spomina v človeške mehanizme kot referenčni okvir za poznejšo primerjavo človeškem smislu [5, 18]. Semantično sidranje je posredno in z UI. Eden od temeljnih mehanizmov je pozornost, ki omogoča nestabilno. usmerjanje miselnih virov na relevantne dražljaje; Učenje. Napredek temelji na masivnih korpusih in gradientni nevroznanstvene raziskave kažejo, da pri tem ključno vlogo optimizaciji; zmožnosti few-shot in učenja v kontekstu kažejo na 22 vzorec-odvisno generalizacijo, toda prenos med domenami in jezikovno vedenje ali celo prehod neformalnih Turingovih robustnost zunaj učne porazdelitve ostajata omejena [19, 20, 21]. preizkusov nista zadosten kriterij [26, 27]. Analize preizkusov v Sklepanje in kavzalnost. Na testih znanja in razumevanja duhu Turinga kažejo, da lahko LLM-ji zavajajo sogovornike, (MMLU) so rezultati visoki [22] ter tudi na problemskih zbirkah vendar to ne rešuje vprašanja razumevanja [5, 6]. (ARC) [21], a zdravorazumsko/kavzalno sklepanje ostaja krhko; Vmesni sklep in princip mnogoterega znanja. V zadnjih treh to se vidi tudi na družini izpeljank Winograd/Schema [23]. letih je UI dramatično napredovala pri reševanju nalog (jezik, Verbalizacija korakov ne implicira resnične kavzalne strukture kontekst, zunanji spomin, del sklepanja), medtem ko je napredek [20]. pri človeških lastnostih (zavest, globoko semantično sidranje, Jezik. Tekočnost, slog in koherenca so blizu človeški ravni, metakognicija) minimalen. Razlaga je skladna z Gamsovim vendar halucinacije ter pomanjkljiva pragmatika razkrivajo principom mnogoterega znanja: nerešena ostaja integracija odsotnost stabilnega semantičnega sidranja v izkušnjo in statističnih, simbolnih, kavzalnih, epizodičnih in utelešenih družbeni kontekst [5, 24]. reprezentacij v enoten, konsistenten in samoreferenčen model Kreativnost. Generativni sistemi so izjemno produktivni pri sveta, ki bi omogočal stabilno razlago pomena in namere [18, 19, rekompoziciji in variaciji znanih vzorcev, a redkeje dosegajo 20, 6]. Na praktični ravni to odpira tudi vprašanja vpliva na konceptualne preboje s kulturno umeščenostjo ([25]; prim. tudi uporabnika (npr. kognitivna odvisnost/atrofija; [28]) in potrebo človeško perspektivo v [12]). po hibridnih zasnovah človek–UI, obravnavanih v nadaljevanju Čustva in motivacija. Affective computing omogoča [29, 30, 31]. Na prvi pogled je revolucionarna izboljšava prepoznavo in simulacijo čustvenih signalov, politike delovanja LLMjev skladna z veliko mnogoterostjo globokih nagrajevanja (npr. RLHF) pa oblikujejo vedenje modelov; to nevronskih mrež, ki zajamejo znanje s celotnega sveta, po drugi niso notranja čustva ali namere v smislu fenomenalne izkušnje strani pa je izračun sam še vedno preveč klasičen in ne vsebinsko [26] (kontrast z [3]). mnogoter, da bi po principu mnogoterega znanja dosegel pravo Metakognicija. Modeli lahko ocenjujejo negotovost in se inteligenco. Zdi se, da je potreben še en preboj na področju samopopravljajo preko zunanjih preverjanj, kar je proceduralna mnogoterega sklepanja. ocena, ne introspektivna samoregulacija, kot jo poznamo pri ljudeh [20; 11 – razmejitev racionalnosti]. Socialna kognicija. LLM-ji uspešno posnemajo elemente 4 Primerjalna analiza teorije uma v besedilnih scenarijih, toda uspeh pogosto izhaja iz V tabeli 1 je prikazana primerjava kognitivnih funkcij ljudi in UI statistične izpostavljenosti vzorcem, ne iz resničnega z ločitvijo funkcije od mehanizmov in z dodanimi metrike/evale razumevanja namere, ironije ali pragmatičnih implikatur v ter hibridne vzorce sodelovanja. Je usklajena z načelom situiranih kontekstih [24] (prim. človeško ToM pri [15]). mnogoterega znanja (integracija statističnih, simbolnih, Zavest in Turingov test. Ni empiričnih dokazov o kavzalnih, epizodičnih in utelešenih reprezentacij). fenomenalni zavesti pri današnjih modelih; prepričljivo Tabela 1: Primerjava kognitivnih lastnosti ljudi in UI Dimenzija Človek – UI – Tipične metrike / Prednost hibrida Glavna mehanizem/lastnost mehanizem/last evali (človek ↔ UI) tveganja nost Pozornost Selektivna, Algoritemsk Dolgi konteksti UI filtrira in Pristranskost zavestno a self-attention; (recall@k), povzame; človek podatkov, “lost- nadzorovana; dolgi konteksti; robustnost na šum, validira pomen in in-the-middle” fronto-parietalne brez RT/točnost v cilje mreže; omejena fenomenalne vizualnih nalogah kapaciteta izkušnje Spomin Delovni, Parametri + Factuality/consist RAG za dejstva Halucinacije, deklarativni/procedu kontekst; ency, retrieval + človek za “source ralni, epizodični zunanji precision/recall, kontekst in vire amnezija” pomnilnik long-context QA (RAG); brez epizodičnosti Učenje Malo primerov, Učenje na Sample- Človek oblikuje Overfitting na kompozicionalnost, masivnih efficiency, OOD abstrakcije; UI benchmarke, analogije korpusih, generalizacija, few- optimizira reward-hacking gradientna shot optimizacija; omejen prenos 23 Dimenzija Človek – UI – Tipične metrike / Prednost hibrida Glavna mehanizem/lastnost mehanizem/last evali (človek ↔ UI) tveganja nost Sklepanje Simbolno + Statistično ARC, MMLU- UI generira Prepričljive (zdravorazumsko/kav kavzalno sklepanje; reasoning, hipoteze; človek racionalizacije zalno) modeliranje verižna razlaga WinoGrande, causal izvaja kavzalno brez razumevanja brez nujne benches presojo kavzalnosti Jezik Semantično Visoka Hallucination UI pripravi Prepričljive (semantika/pragmatik sidranje, situirana tekočnost; rate, faithfulness, osnutek; človek napačne trditve a) pragmatika, ironija omejeno pragmatični testi poskrbi za sidranje; pragmatiko/odgovo halucinacije rnost Kreativnost Konceptualni Rekompozi NSU (novelty- UI diverzira Homogenizac preboji, kulturna cija znanih surprise-usefulness), ideje; človek ija, “mode umeščenost vzorcev, visoka človeška presoja izbere/okviri collapse” kulture produkcija problem Metakognicija Introspekcija, Ocenjevanje Kalibracija UI signalizira Automation strategije učenja, negotovosti, (ECE), self- negotovost; človek bias, pretirano nadzor napak samopopravlja consistency, error- sprejme odločitev zaupanje nje preko orodij correction rate Socialna kognicija Implicitna teorija Posnemanje ToM naloge, UI predlaga “Lažna (ToM) uma, situirana vzorcev ToM v pragmatične interpretacije; empatija”, razlaga namer besedilu implikature človek preveri manipulacija kontekst Zavest Fenomenalno Ni dokazov — (ni konsenza o — Antropomorfi doživljanje, o fenomenalni metričnih testih) (obravnavamo kot zem, napačna subjektivnost zavesti orodje) atribucija lastnosti Učinek na Ohranjanje Tveganje Longitudinalne Goldilocks Kognitivna uporabnika kompetenc, kognitivne meritve kompetenc cona: načrtovana atrofija, izguba samostojnost odvisnosti brez pomoči raba UI + motivacije periodične naloge brez pomoči rezultate enotnega benchmarka); namen grafa je ilustracija razlike med nalogovno zmogljivostjo in človeškimi lastnostmi. 5 Metodološki pristopi k primerjavi Ključne ugotovitve (2022 → 2025; primanjkljaj do 100 % v V tem razdelku združimo (a) prikaz razvojnega trenda LLM-jev oklepaju): glede na človeka (100 %) ter (b) pregled metodološke pokritosti • Jezik 85→98 (–2): skoraj pri človeški ravni, a še z po kognitivnih dimenzijah. Namen je dvojni: pokazati, koliko se ranljivostjo za halucinacije/pragmatiko. je zmogljivost nalog približala človeški ravni in kje so merjenja • Pozornost/kontekst 60→80 (–20): velik skok zaradi ter mehanistična razlaga že dovolj zrela, da tak napredek daljših kontekstov in boljšega sledenja. zanesljivo ocenjujemo. • Spomin 30→65 (–35), Učenje (few-shot) 45→65 (–35): (%) LLM-jev pred ~3 leti (≈2022) in danes (2025) za devet napredek z RAG in in-context učenjem, a vrzel ostaja. Opis Slike 1: Stolpčni graf prikazuje relativno zmogljivost kognitivnih dimenzij, normalizirano na človek = 100 %. Gre za • Sklepanje (zdravorazumsko/kavzalno) 35→60 (–40): konceptualno, strokovno oceno trendov na podlagi znanih evalov opazen dvig, vendar še daleč od stabilne kavzalnosti. (npr. • Socialna kognicija (ToM/pragmatika) 40→60 (–40): MMLU/ARC/WinoGrande za sklepanja, dolg kontekst/RAG za spomin, pragmatične teste in “hallucination izboljšave v tekstnih scenarijih, omejena situiranost. rate” za jezik ipd.). Namen grafa je vizualizirati približevanje • Kreativnost (konceptualno prestrukturiranje) 55→72 (– človeški ravni in koliko manjka po posameznih dimenzijah. 28): visoka produkcija, manj konceptualnih prebojev. Vidimo močan dvig pri jeziku, pozornosti/kontekstnem • Metakognicija (samoregulacija) 20→35 (–65): rast sledenju in zunanjem spominu, opazen napredek pri sklepanju, počasna; ocene negotovosti še niso pristna učenju iz malo podatkov, kreativnosti ter socialni kogniciji; samorefleksija. metakognicija pa raste počasneje. Pri “zavesti (fenomenalno • Zavest (fenomenalno doživljanje) 0→0 (–100): brez doživljanje)” napredka ni—ta ostaja na 0 %, kar je skladno z napredka — skladno z nerešenim integracijskim idejo, da problem mnogoterega znanja še ni razrešen. Gre za problemom. konceptualno, strokovno oceno trendov (ne neposredno za 24 Slika 1: Napredek LLM-jev v treh letih glede na človeka (=100 %). Relativni napredek LLM-jev po kognitivnih lastnostih v treh letih (≈2022→2025), normalizirano na človek = 100 %. Vidni so veliki dvigi pri jeziku, pozornosti/kontekstnem sledenju in zunanjem spominu; srednji pri učenju in sklepanju; počasni pri metakogniciji; pri zavesti napredka ni. Vizualna predstavitev je konceptualna sinteza evalov in služi ponazoritvi razlike med nalogovno zmogljivostjo in človeškimi lastnostmi. Slika 2: Metodološka matrika človek–UI po dimenzijah. Toplotna matrika (0–3) na sliki 2 ocenjuje pokritost meritev po petih merilih: (1) ujemanje s človeškimi testi, (2) ujemanje z UI-benchi, (3) konstruktna veljavnost, (4) merljivost mehanizmov, (5) potencial hibrida (merljiv prispevek človek↔UI). Višje vrednosti na Sliki 2 pomenijo, da imamo za dano (kreativnost, zavest) opozarjajo, da potrebujemo kvalitativne dimenzijo bolj zrelo metodologijo, zato so trditve iz Slike 1 tam protokole, mešane metode in previdno interpretacijo. zanesljivejše (npr. jezik, pozornost, spomin). Nižje vrednosti Minimalni protokol za “pošteno” primerjavo: 25 1. Za vsak konstrukt izberemo par človeški test ↔ UI-bench 4. Kavzalno razumevanje in intervenienčno sklepanje – in skupne metrike (task + kalibracija). LLM-ji so močni v korelacijah in opisih, a zahtevna 2. Dodamo mehanistično analizo (probingi/ablacije pri UI; kavzalna vprašanja (kaj bi se zgodilo, če bi posegli X?) EEG/fMRI/RSA pri človeku). ostajajo šibki brez eksplicitnih kavzalnih modelov 3. Izvedemo hibridni A/B preizkus (brez UI vs. z UI) in oziroma simulacij, ki zahtevajo več kot napovedovanje retest po 1–3 mesecih za ohranitev kompetenc. besedila. 4. Poročamo tudi vrzel do 100 % (Slika 1) in zrelost meritev 5. Moralno presojanje in odgovornost – odgovori so rezultat (Slika 2), da ne mešamo nalogovne zmogljivosti z pravil in vzorcev iz podatkov ter umerjanja (RLHF), ne notranjimi lastnostmi. notranje vrednotne strukture. Model ne “razume” odgovornosti; tvega konsistentnost z normami le toliko, kolikor so te kodirane v podatkih in pravilih. 6 Etika in filozofske dileme, hibridna 6. Čustvena izkušnja – lahko opisuje in pravilno označuje inteligenca čustva, ne pa jih dejansko doživlja; posledično ni afektivne modulacije pozornosti, motivacije ali UI odpira pomembna etična in filozofska vprašanja. Floridi dolgotrajnih preferenc, kakršne oblikuje človeška & Cowls [32] predlagata okvir petih načel (dobrobit, avtonomija, homeostaza. pravičnost, razlaga, odgovornost), ki so ključna za etično rabo 7. Utelešenost in situacijska inteligenca – brez telesa, potreb UI. Filozofske razprave, kot je Searlov »Chinese Room in omejitev ostaja prilagajanje v realnem času omejeno. Argument« [33], pa opozarjajo, da so sistemi UI morda zgolj Tudi agentne razširitve ostanejo odvisne od zunanje sofisticirani infrastrukture (orodij, pravilnikov in varovalk). manipulatorji simbolov brez resničnega razumevanja. To odpira vprašanje meje med simulacijo in 8. Učenje v živo (continual/online learning) z zanesljivimi resnično inteligenco. posodobitvami – LLM-ji tipično ne posodabljajo Sodobne raziskave govorijo o hibridni inteligenci parametrov med rabo; trajne spremembe zahtevajo nov (Dellermann et al. [34]), ki temelji na integraciji človeških in trening ali zunanje pomnilnike. To omejuje kumulativno, umetnih kognitivnih sposobnosti. Človek prispeva razumevanje osebno prilagojeno znanje. pomena, etične presoje, kreativnost in socialno inteligenco, UI 9. Robustno reševanje novosti – pri res novih, slabo pokritih pa obdelavo podatkov, vzorčno prepoznavanje in skalabilnost. problemih se hitro pokaže regresija v stereotipe iz Takšna integracija presega meje posameznega sistema in korpusov; brez eksperimentiranja v svetu je inovacija nakazuje prihodnost sodelovanja, ne tekmovanja. pogosto “rekombinacija”, ne pa resnično odkritje. Zato je splošen vtis, da kljub izrednemu funkcionalnemu napredovanju LLM-ji še vedno nimajo ključnih človeških 7 Diskusija in zaključek lastnosti, kot je zavest. Analiza potrjuje, da človeška in umetna Naša analiza kaže, da se LLM-ji razvijajo z izjemno hitrostjo tudi inteligenca – čeprav obe temeljita na nevronskih sistemih – izhajata iz različnih mehanizmov: človeške sposobnosti so na področjih, ki jih pogosto razumemo kot “kognitivne”. V prepletene z zavestjo, afektom in socialno konstituiranim primerjavi z ugotovitvami Gams in Kramar (2024) [6] opažamo pomenom, medtem ko UI temelji na statistični inferenci nad premik od pretežno površinskih, vzorčno-temeljenih odgovorov velikimi korpusi. k bolj stabilnim večkorakom, boljšemu vodenju plana, Ob tem velja opozoriti na morebitno nevarnost kognitivne učinkovitejšemu uporabljanju zunanjih orodij ter k bolj atrofije: prekomerna zanašanje na UI lahko zmanjšuje motivacijo konsistentni samopopravi. Napredek je zlasti v zmožnosti daljših za samostojno reševanje problemov in s tem slabi določene verig sklepanja, delovnega spomina s pomočjo zunanje človeške kognitivne zmožnosti (npr. [34]). Zato naj bo prihodnja kontekstne obnove (RAG) ter v boljši meta-kontroli (npr. raba UI etično zasnovana in kognitivno uravnotežena: UI kot detekcija lastnih napak in zahteva po dodatnih podatkih). ojačevalnik, ne nadomestek. Kljub temu ostajajo nekatere temeljne človeške kognitivne Komplementarno partnerstvo med človekom in UI združuje lastnosti za LLM-je (še) nedosegljive. Ključne med njimi so: človeško razumevanje pomena, kavzalnost, socialno in čustveno 1. Fenomenalna zavest (qualia) – LLM-ji ne izkazujejo inteligenco ter ustvarjalnost z računsko močjo, skalabilnostjo in subjektivnega doživljanja. Njihova arhitektura ostaja natančnostjo UI. Takšna sinergija presega omejitve posameznih statistično napovedovanje naslednjega simbola brez sistemov in odpira pot k bolj kakovostnemu napredku znanosti, notranjega fenomenalnega prostora; “poročanje o umetnosti in družbe, k ustvarjanju »super« človeka. občutkih” je zgolj generativna imitacija vzorcev iz podatkov. 2. References / Literatura Stabilen jaz in agencija – modeli nimajo trajnega, telesno sidranega “sebstva” z lastnimi nameni. Cilji so zgolj [1] U. Neisser. 1967. Cognitive Psychology. Appleton-Century-Crofts, New implicitni v pozivu in optimizaciji izgube; ni kontinuitete York, NY. [2] J. R. Anderson. 2010. Cognitive Psychology and Its Implications. Worth, namer skozi čas brez zunanje orkestracije. New York, NY. 3. [3] A. R. Damasio. 1994. Descartes’ Error: Emotion, Reason, and the Human Semantična usidranost in referencialnost – pomen izhaja Brain. G. P. Putnam’s Sons, New York, NY. iz statističnih korelacij, ne iz utelešene interakcije s [4] G. Tononi, M. Boly, M. Massimini, and C. Koch. 2016. Integrated svetom. Brez senzorimotorike in lastnih izkustev je “o- information theory: from consciousness to its physical substrate. Nature Reviews Neuroscience 17, 450–461. čem-je-govor” (aboutness) posredno posnet iz korpusov, [5] OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774. zato ostajajo zdrsi v referencah in halucinacije. [6] M. Gams and S. Kramar. 2024. Evaluating ChatGPT’s Consciousness and Its Capability to Pass the Turing Test: A Comprehensive Analysis. Journal 26 of Computer and Communications 12(3), 219–237. https://doi.org/10.4236/jcc.2024.123014 [7] M. I. Posner and S. E. Petersen. 1990. The attention system of the human brain. Annual Review of Neuroscience 13, 25–42. [8] A. Baddeley. 2003. Working memory: looking back and looking forward. Nature Reviews Neuroscience 4(10), 829–839. [9] N. Cowan. 2001. The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences 24(1), 87–185. [10] P. A. Carpenter, M. A. Just, and P. A. Shell. 1990. What one intelligence test measures: A theoretical account of processing in the Raven Progressive Matrices Test. Psychological Review 97(3), 404–431. [11] K. E. Stanovich. 2010. What Intelligence Tests Miss: The Psychology of Rational Thought. Yale University Press, New Haven, CT. [12] [M. Csikszentmihalyi. 1996. Creativity: Flow and the Psychology of Discovery and Invention. HarperCollins, New York, NY. [13] [M. A. Runco and G. J. Jaeger. 2012. The standard definition of creativity. Creativity Research Journal 24(1), 92–96. [14] [M. H. Immordino-Yang. 2015. Emotions, Learning, and the Brain: Exploring the Educational Implications of Affective Neuroscience. W. W. Norton & Company, New York, NY. [15] [C. D. Frith and U. Frith. 2007. Social cognition in humans. Current Biology 17(16), R724–R732. https://doi.org/10.1016/j.cub.2007.05.068 [16] [A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, 1097–1105. [17] Y. LeCun, Y. Bengio, and G. H. Hinton. 2015. Deep learning. Nature 521, 436–444. [18] M. Mitchell. 2019. Artificial Intelligence: A Guide for Thinking Humans. Farrar, Straus and Giroux, New York, NY. [19] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman. 2017. Building machines that learn and think like people. Behavioral and Brain Sciences 40, e253. [20] G. Marcus. 2020. The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence. arXiv:2002.06177. [21] F. Chollet. 2019. On the Measure of Intelligence. arXiv:1911.01547. [22] D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt. 2020. Measuring Massive Multitask Language Understanding. arXiv:2009.03300. [23] H. J. Levesque, E. Davis, and L. Morgenstern. 2012. The Winograd Schema Challenge. In Principles of Knowledge Representation and Reasoning (KR 2012), 552–561. AAAI Press. [24] E. M. Bender and A. Koller. 2020. Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of ACL 2020, 5185–5198. [25] M. A. Boden. 2016. AI: Its Nature and Future. Oxford University Press, Oxford. [26] S. Dehaene, H. Lau, and S. Kouider. 2017. What is consciousness, and could machines have it? Science 358(6362), 486–492. https://doi.org/10.1126/science.aan8871 [27] J. R. Searle. 1980. Minds, brains, and programs. Behavioral and Brain Sciences 3(3), 417–457. [28] E. F. Risko and S. J. Gilbert. 2016. Cognitive offloading. Trends in Cognitive Sciences 20(9), 676–688. https://doi.org/10.1016/j.tics.2016.07.002D. Dellermann, P. Ebel, M. Söllner, and J. M. Leimeister. 2019. Hybrid Intelligence. Business & Information Systems Engineering 61(5), 637–643. [29] B. Shneiderman. 2022. Human-Centered AI. Oxford University Press, Oxford. [30] L. Floridi and J. Cowls. 2019. A unified framework of five principles for AI in society. Harvard Data Science Review 1(1). [31] E. F. Risko and S. J. Gilbert. 2016. Cognitive offloading. Trends in Cognitive Sciences 20(9), 676–688. [32] B. Sparrow, J. Liu, and D. M. Wegner. 2011. Google Effects on Memory: Cognitive Consequences of Having Information at Our Fingertips. Science 333(6043), 776–778. [33] B. Oakley, et al. 2025. Cognitive Effects of AI Overuse (forthcoming). MIT Press, Cambridge, MA. 27 Coherentist Echo Chambers ∗ Martin Justin Borut Trpin Faculty of Arts, University of Maribor Faculty of Arts, University of Ljubljana Maribor, Slovenia Faculty of Arts, University of Maribor martin.justin1@um.si Ljubljana and Maribor, Slovenia borut.trpin@ff.uni- lj.si Abstract community invites them to. Unlike Nguyen’s analysis, these two mechanisms seem epistemically benign and do not require agents This paper investigates the transformation of epistemic bub- to distrust specific information sources expressly. bles into echo chambers through rational belief-forming pro- This raises an interesting question: can echo chambers arise cesses. Building on Nguyen’s distinction between epistemic bub- in communities where agents act rationally? More specifically, if bles—formed by omission—and echo chambers—formed by active a community starts as an epistemic bubble, can a belief-forming distrust, we explore whether echo chambers can emerge without process that we otherwise deem rationally acceptable prevent malicious intent. Using a simulation model, we demonstrate that the members of the community from breaking out of it, thus coherence-based reasoning can trap agents in echo chambers, transforming the community into an echo chamber? even when they act rationally. This finding challenges the view In this paper, we conduct a simulation study showing that that echo chambers require intentional manipulation of epis- reasoning based on coherence can play this role in specific cir- temic trust and suggests that rational cognitive strategies may cumstances. Specifically, taking coherence of their beliefs into inadvertently contribute to harmful social epistemic dynamics. account when considering whether to accept new information Keywords can prevent agents from escaping an epistemic bubble and trap them in an echo chamber. This suggests that a rational reason- echo chambers, coherence, social epistemology, agent-based mod- ing pattern can cause one of our time’s more pernicious social eling epistemic phenomena. 1 The rest of this paper is organized as follows. In Section 2, we Introduction discuss coherence-based reasoning in more detail. We show that it In his seminal discussion, Nguyen [11] argues that, despite both can be rational in some circumstances, and discuss existing results being characterized by groupthink, epistemic bubbles and echo about its possible negative social epistemic effects. The section chambers are distinct social epistemic phenomena. Where epis- concludes with a brief overview of formal measures of coherence. temic bubbles are formed by excluding some relevant information Section 3 presents our model, adapted from our previous work sources, echo chambers are created by actively discrediting spe- in [9, 21]. In Section 4, we present the simulation study results. cific sources. Specifically, Nguyen asserts that echo chambers Section 5 discusses the results and concludes the paper. require “a significant disparity in (epistemic) trust between mem- bers and non-members.” Consequently, echo chambers cannot be 2 Coherence counteracted by exposing their members to additional sources of information. 2.1 The Role of Coherence in (Social) Nguyen [11] accepts that epistemic bubbles can form acciden- Reasoning tally, e.g., as a consequence of reading certain news sources, or Intuitively, coherence describes how well propositions in a set not actively seeking testimony from people beyond your friend “hang together” [4]. Take, for example, the difference between group. In contrast, he argues that the creation of echo chambers these two information sets. 𝑆1 = {“A is a well-regarded author”, “is something more malicious,” which involves discrediting insti- “Critics praised A’s last book”, “A’s last book was nominated for tutions and individuals without regard for the actual epistemic an important literary prize”}; 𝑆2 = {“A is a well-regarded author”, worth. In his view, echo chambers are often (although not neces- “Today is Thursday”, “Python is a general purpose programming sarily) created intentionally as a means to “maintain, reinforce language”}. We intuitively sense that 𝑆 1 is more coherent than and expand power through epistemic control” [11]. 𝑆 2: the propositions in 𝑆 support each other, while those in 𝑆 1 2 However, some later work in social epistemology has contra- seem completely independent. dicted this claim. For example, Baumgaertner and Justwan [3] Coherentism is usually introduced as a theory of epistemic explore additional mechanisms that can cause the formation of justification: a belief is justified if and only if it belongs to a co- echo chambers, which do not rely on manipulating epistemic herent system of beliefs [13]. Although BonJour [4] gave one trust. Using an agent-based polarization model, they show that of the most influential modern defences, most epistemologists echo chambers, where members persist in their beliefs despite have since rejected the view. Still, coherence has been thought exposure to contrary information, can arise via a combination of to play an important epistemic role. For example, Harman’s ac- a social structure and agents’ willingness to believe what their count of reasoned belief revision treats coherence as central to ∗ belief management: a new commitment is accepted only if it Both authors contributed equally to this research. contributes to the overall coherence of an agent’s set of atti- Permission to make digital or hard copies of all or part of this work for personal tudes [7]. This perspective highlights that coherence is not only or classroom use is granted without fee provided that copies are not made or a theory of justification but also a guide to acceptance and be- distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this lief revision. Similarly, Angere [1] argues that, despite not being work must be honored. For all other uses, contact the owner /author(s). truth-conductive in general, coherence can act as an effective Information Society 2025, Ljubljana, Slovenia heuristic for choosing a correct information set when more re- © 2025 Copyright held by the owner/author(s). https://doi.org/10.70314/is.2025.cogni.3 liable methods are unavailable. On the other hand, Olsson [12] 28 Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia Justin and Trpin argues that coherence plays an important negative role: “If our same time) and the probability of their disjunction (i.e., when at beliefs show signs of incoherence, this is often a good reason least one of them is true). This intuition is formalized by Olsson for contemplating a revision.” Goldberg and Khalifa [6] make [14] and Glass [5], who propose the following measure: a similar argument in a social context, arguing that agents are unjustified in holding beliefs that do not cohere with accepted 𝑃 A A 1 ( , . . . , A 𝑛 ) 𝑃 ( A 1 , . . . ,𝑛 ) 𝑐𝑜ℎ𝑂𝐺 : ( S )= = background information of their epistemic community. 𝑃 A 1 ( ∨ · · · ∨ A 𝑃 𝑛 ) 1 − (¬A1, . . . , ¬A𝑛 ) In contrast to the work highlighting its positive epistemic role The third measure we consider in our model is a crossover mea- for individual reasoners, research also shows that coherence can sure, recently proposed by Hartmann and Trpin [8], which com- have adverse social-epistemic effects. In a model by Singer et bines elements of both intuitions: al. [20], agents deliberate by exchanging reasons, which either 𝑃 A1, . . . , A𝑛 () ( ) · · · ( ) support or oppose a conclusion. Agents also have limited memory: 𝑐𝑜ℎ𝐻𝑇 ( 𝑃 A1 𝑃 A𝑛 S ) := / 1 𝑃 A 1, . . . , A 𝑃 𝑛 − (¬ ¬ ) 1 −(¬A1) · · · 𝑃 (¬A𝑛) when at capacity, they must “forget” one of their reasons to accept a new one. Agents using a coherence-based strategy for managing 3 The Model memory prefer to forget a reason that runs contrary to the view supported by the totality of their reasons. Authors argue that 3.1 Model Entities: World and Agents The model we used in this study is a slightly modified version this is a rational strategy of memory management. Nevertheless, it leads to persistent group polarization between agents. of our model, first presented in [9, 21]. In the model, agents try to form an accurate belief about the world by gathering and In essence, coherence underpins rational reasoning patterns but can also cause negative social-epistemic phenomena. Conse- sharing information about it. The “world” in the model represents a field of interest or research, e.g., contemporary politics in some quently, coherence-based reasoning is an interesting candidate country, the stock market, or AI-powered drug development. for the context of our research problem: can echo chambers arise Agents represent people learning and communicating about it, in communities where agents employ coherence-based reason- e.g., social media users, friends, coworkers, or scientists working ing? To answer this question, we develop an agent-based group learning model, wherein agents use coherence-based reasoning in on the same problem. They all gather information about the topic—read about it, listen to experts, conduct experiments—talk information gathering and revision of beliefs. Specifically, when updating beliefs based on new information, agents first check about it with others, and form opinions based on it. More technically, the model world consists of a Bayesian net- whether this information would decrease the coherence of their beliefs. If not, they accept the update. If yes, they ignore the new work (BN), representing a set of probabilistically related events. A BN consists of a directed acyclic graph (DAG) and a condi- information and stick to their existing beliefs. In short, agents tional probability distribution (CPD) (see [15]). DAG represents refuse information that would make their beliefs less coherent. events (nodes), either true or false, and conditional dependen- 2.2 Measuring Coherence cies between them (edges). CPD contains information about the likelihood of each event occurring given values of other events. Before taking a closer look at this belief updating dynamic and Figure 1 shows one example of a simple BN usually referred to the model in general, coherence must be defined in more detail as “Sprinkler”, consisting of only four nodes. This is the BN we and operationalized. Several probabilistic measures have been 1 used in our simulations. proposed to operationalize coherence, each capturing different intuitions about what makes a set of propositions coherent. Two key intuitions underlie these measures. The first one is: Deviation from Independence: The Less inde- pendent the propositions in the set are, the more coherent the set is. The intuition here is that coherence derives from how strongly propositions are interconnected probabilistically. If their prob- ability of occurring together is no more than chance, as in the case of 𝑆 2, a set is neither coherent nor incoherent. If they are Figure 1: The “sprinkler” network, where are propositional 𝐶, 𝑆 , 𝑅, 𝐺 more likely to hold jointly, the set is coherent, as in the case 𝑆 1. variables with corresponding values C: “It is cloudy”, ¬C: “It is not cloudy”, In a reverse situation, the set is incoherent. This intuition was S: “The sprinkler is turned on”, ¬S: “The sprinkler is not turned on”, R: “It rains”, ¬R: “It does not rain”, G: “The grass is wet” and ¬G: “The grass is not formalized by Shogenji [19], as the following measure for a set wet,” and the corresponding probabilities of its CPD. Reproduced from [9]. of propositions A1, . . . , A𝑛 S = { }: 𝑃 A 1, . . . , A𝑛 ( ) The agents already have an accurate representation of the 𝑐𝑜ℎ𝑆 : ( S )= 𝑃 A1 𝑃 A𝑛 events in the world and their relations—in other words, they ( ) · · · ( ) are aware of the structure of the BN in question. What they try The second intuition is: 2 Relative Overlap to learn is the underlying probability distribution . To do this, : The more overlap among the they repeatedly observe the world to learn about the values of propositions in a set, the more coherent the set is. the individual events. For example, one observation the agents The idea being this intuition is that the coherence stems from might gather is 𝑆1 = [Cloudy=False, Sprinkler=False, Rain=False, agreement between propositions [13]. Propositions agree when 1 In principle, we could use any BN; however, larger networks are computationally if one is true, the others are also true. Probabilistically, we can increasingly demanding. represent this as a comparison between the probability of the 2 This is in contrast to some other agent-based models that utilize BN, where agents propositions holding jointly (i.e., when all of them are true at the learn about concrete values of one instantiation of the BN (see especially [2, 17]) 29 Coherentist Echo Chambers Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia Wet Grass=False], another is 𝑆 = [Cloudy=True, Sprinkler=False, 2 3.3 The Procedure Rain=True, Wet Grass=True], etc. Agents then fit these obser- The simulations of the model proceed in rounds or steps. The vations over the BN via maximum likelihood estimation (MLE). following parameters are determined before the start of the sim- That is, they form a subjective belief about the conditional proba- ulation: the number of agents in a group, the number of agents bility distribution that is the most likely given their observations using coherence-based reasoning, the way agents share informa- about the world. tion between each other, agents’ priors, the chance of gathering Agents also share information among each other. They can misleading information at the start, and the rate of change in be connected in different communication networks, determining this chance. Each round of the simulation then consists of the who can share information with whom. In our simulations, we following actions: use three different such networks. A cycle connects each agent (1) Agents collect information. only to two closest agents; a wheel is similar to the cycle with (2) Agents share information. the addition of one central agent connected to all other agents; a (3) Agents update their beliefs based on their type: complete network connects each agent to all other agents. (a) Coherentist agents first check the coherence of their new belief, and accept it only if it is at least as coherent as their prior belief. (b) Other agents straightforwardly update based on the new information. 3.2 The Setup: Coherence-Based Reasoning, 4 Results Misleading Information, and Dynamic We simulated groups of 10 agents with 2, 5, or 8 coherentist Environment agents. The agents were connected in a cycle, wheel, or a com- We wish to explore whether coherence-based reasoning can trap plete network; coherentists and other agents were shuffled and agents into an echo chamber by preventing them from changing placed randomly, so their distribution on the network wouldn’t their beliefs in response to accurate information about the world. bias the results. In all groups, agents’ starting subjective proba- We need to extend the above model in three ways to model such bility distribution (i.e., their belief ) was the same as that of the a situation. misleading information source. We generated the misleading in- First, we need to add coherence-based reasoning. We do this formation source by randomly changing the distribution of the as follows. Some agents do not simply form a belief about the base “Sprinkler” BN. The only constraint was that the misleading CPD based on their information. Instead, when presented with information source was more coherent, i.e., it scored better on new information, they first check whether this new belief is each of the three coherence measures. at least as coherent as their existing one. To do this, they first Each agent drew 100 samples from the information source per determine the most probable state of the world based on the round. At the start of the simulation, they had a 100% chance of distribution incorporating new information (e.g., that it is cloudy, drawing information from the misleading source. Throughout the sprinkler is off, it is raining, and the grass is wet). Then, the simulation, this chance was gradually reducing by 1%. This they check how coherent this state is using one of the coherence means that from round 100 onward, agents only received accurate measures presented above. If this state is at least as coherent information about the world. To test the persistence of any effect as the state that is most probable based only on their existing coherence-based reasoning might have, we ran each simulation information, they accept the new information. If it is less coherent, for 300 rounds. they reject new information. Figure 2 shows how the accuracy of agents’ beliefs changes Second, to mimic an epistemic bubble, we allow for situations over time. The belief accuracy is measured as Kullback-Leibler in which agents fail to form accurate beliefs, not because of their (KL) divergence of the agent’s probability distribution from the own selective exposure, but because they lack access to reliable actual world’s distribution, quantifying the discrepancy between information. In our model this is captured in two ways: agents two probability distributions [10]; consequently, lower values may start with inaccurate priors, implemented by setting their thus mean more accurate beliefs. The red line represents a be- initial CPD as a parameter, and they may occasionally receive lief of a non-coherentist agent, averaged over all parameters. input from a misleading BN rather than the real-world BN. The The three blue lines represent beliefs of coherentist agents for misleading BN differs from the real one in its CPD. For exam- different network structures, averaged over other parameters. ple, while in the real world the sprinkler substantially increases The figure shows that agents who do not employ coherence- the likelihood of wet grass, this relation may be absent in the based reasoning reliably form accurate beliefs about the world. misleading BN. This is expected—these agents take their information at face Thirdly, we place agents in a dynamic epistemic environment, value, so nothing prevents them from updating on more accurate meaning the chance of gathering misleading information de- information. On the other hand, coherentist agents, on average, creases over time. This represents a gradual breaking of an epis- retain inaccurate beliefs despite being presented with accurate temic bubble: agents start with inaccurate priors and are likely to information. After round 100, agents do not receive misleading gather misleading information. Gradually, they begin receiving information; the fact that coherentist agents’ beliefs on average more accurate information (we determine the rate of change as a do not change much after that shows that they practically insu- parameter of the model). Usually, this would mean that agents late themselves from it. In other words, they seem to actively would also gradually start to form more accurate beliefs. We are ignore an accurate information source, which is how Nguyen interested in whether coherence-based reasoning can prevent [11] defines echo chambers. This result is robust for different this, trapping agents in an echo chamber where they ignore the communication networks, but seems to be increased by sparser accurate source of information. communication. 30 Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia Justin and Trpin the world is represented by drawing samples from a BN, commu- nication is represented by sharing these samples, belief revision by MLE, a mathematical procedure, etc. As various philosophers of science have pointed out, such highly idealized models cannot be used to make real-world predictions, provide actual expla- nations for phenomena, or suggest normative prescriptions [18, 16, 22]. It would be wrong to conclude that coherence-based reasoning is the cause of people’s persistent false beliefs. Although idealized models cannot explain real-world phenom- Figure 2: Distance of agents’ beliefs from truth over time ena, they provide so-called “how-possible explanations” [16]. In for a more coherent source of misleading information. our case, the results point to coherence-based reasoning as a pos- The X-axis represents steps in the simulation, and the Y- sible explanation for echo chamber formation. Empirical studies axis represents distance from truth measured as the KL- are needed to show this link in practice. divergence. The shaded regions represent 95% confidence intervals. Acknowledgements The authors acknowledge the financial support from the Slove- Table 1 gives a more comprehensive picture of the coherentist nian Research and Innovation Agency (ARIS, project J6-60107 agent’s average belief accuracy (expressed as distance from truth) and research core funding No. P6-0144). for different combinations of parameters. These results again show that the communication structure impacts the results. In References most cases, agents connected in a complete network ended up Mind [1] Staffan Angere. 2008. Coherence as a Heuristic. , 117, 465, (Jan. 2008). closer to the truth than agents connected in the wheel or the cycle [2] Leon Assaad, Rafael Fuchs, Ammar Jalalimanesh, Kirsty Phillips, Leon Schoeppl, and Ulrike Hahn. 2023. A Bayesian agent-based framework for networks in comparable situations. Additionally, we can see that argument exchange across networks. . eprint: 2311.09254 (cs.SI). arXiv when coherence was measured as deviation from independence [3] Bert Baumgaertner and Florian Justwan. 2022. The preference for belief, [19], the average effect on belief accuracy was the lowest. issue polarization, and echo chambers. Synthese, 200, (Sept. 2022), 412, 5, (Sept. 2022). doi: 10.1007/s11229- 022- 03880- y. [4] Laurence BonJour. 1985. . Harvard The Structure of Empirical Knowledge Coh. Measure Nr. Coh. Complete Wheel Cycle University Press, Cambridge, MA. Olsson-Glass 2 0.52 (0.06) 0.56 (0.05) 0.67 (0.04) [5] David H. Glass. 2002. Coherence, explanation, and Bayesian networks. In Olsson-Glass 5 0.54 (0.06) 0.59 (0.03) 0.71 (0.02) Artificial Intelligence and Cognitive Science, 13th Irish Conference, AICS 2002. Olsson-Glass 8 0.53 (0.06) 0.59 (0.03) 0.65 (0.02) Michael O’Neill, Richard F. E. Sutcliffe, Conor Ryan, Malachy Eaton, and Shogenji 2 0.35 (0.05) 0.46 (0.05) 0.48 (0.05) Niall J. L. Griffith, editors. Springer, Berlin, 177–182. Shogenji 5 0.48 (0.06) 0.43 (0.03) 0.42 (0.02) [6] Sanford C. Goldberg and Kareem Khalifa. 2022. Coherence in Science: A Shogenji 8 0.38 (0.04) 0.41 (0.03) 0.47 (0.03) Social Approach. , 179, 12, (Dec. 2022), 3489–3509. http Philosophical Studies Hartmann-Trpin 2 0.58 (0.06) 0.56 (0.05) 0.67 (0.04) s://link.springer.com/article/10.1007/s11098- 022- 01849- 8. [7] Gilbert Harman. 1986. . The MIT Change in View: Principles of Reasoning Hartmann-Trpin 5 0.63 (0.06) 0.63 (0.03) 0.66 (0.04) Press. Hartmann-Trpin 8 0.52 (0.06) 0.63 (0.03) 0.63 (0.02) Table 1: Average distance from truth of agents’ belief at [8] S. Hartmann and B. Trpin. Forthcoming. Why coherence matters? The Journal of Philosophy. the end of the simulation for different combinations of [9] Martin Justin and Borut Trpin. 2025. Coherence-Based Evidence Filtering: parameters (with one standard error in parentheses). A Computational Exploration. In Proceedings of the Annual Meeting of the Cognitive Science Society 47. [10] Solomon Kullback and Richard A Leibler. 1951. On information and suffi- ciency. , 22, 1, 79–86. The Annals of Mathematical Statistics [11] C. Thi Nguyen. 2020. Echo chambers and epistemic bubbles. , 17, Episteme (June 2020), 141–161, 2, (June 2020). doi: 10.1017/epi.2018.32. 5 Discussion [12] Erik Olsson. 2023. Coherentist Theories of Epistemic Justification. In The Stanford Encyclopedia of Philosophy. (Winter 2023 ed.). Edward N. Zalta and These results show that coherence-based reasoning possibly leads Uri Nodelman, editors. Metaphysics Research Lab, Stanford University. to the creation of echo chambers. In contrast to other proposed [13] Erik J. Olsson. 2022. . Cambridge University Press, Cambridge. Coherentism [14] Erik J. Olsson. 2002. What is the problem of coherence and truth? The Journal mechanisms of echo chamber creation, e.g., active mistrust of cer- of Philosophy, 99, 5, 246–72. tain information sources [11], coherence-based reasoning is not [15] Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of irrational; on the contrary, some argue that it can have positive . Morgan Kaufmann Publishers, San Francisco, CA. Plausible Inference [16] Alexander Reutlinger, Dominik Hangleiter, and Stephan Hartmann. 2018. epistemic value. This implies that potentially rational reasoning The British Journal for the Philosophy of Understanding (with) toy models. patterns can lead to pernicious social epistemic phenomena. , 69, (Dec. 2018), 1069–1099, 4, (Dec. 2018). doi: 10.1093/bjps/axx005. Science [17] Klee Schöppl. 2025. Industry Influencing Collective Scientific Reasoning: A That said, our study has two significant limitations. First, the Bayesian, Agent-based Exploration. In Proceedings of the Annual Meeting of misleading information source presented a more coherent pic- the Cognitive Science Society, 47. ture of the world than the truth. This is not an unreasonable [18] Dunja Šešelja. 2023. Agent-based modeling in the philosophy of science. (2023). https://plato.stanf ord.edu/entries/agent- modeling- philscience/#Peer assumption: conspiracy theories and misinformation often offer DisaScie_1. more straightforward, intuitive, and coherent explanations of [19] Tomoji Shogenji. 1999. Is coherence truth conducive? Analysis, 59, 4. complicated events than evidence. Nevertheless, it might impor- [20] Daniel J. Singer, Aaron Bramson, Patrick Grim, Bennett Holman, Jiin Jung, Karen Kovaka, Anika Ranginani, and William J. Berger. 2019. Rational social tantly affect our results. Given that coherentist agents consider and political polarization. , 176, 9, 2243–2267. Philosophical Studies information based on its coherence, accurate information might [21] Borut Trpin and Martin Justin. 2025. Coherence as a constraint on scientific inquiry. . Synthese manage to overcome a less coherent misinformation source even [22] Michael Weisberg. 2007. Three kinds of idealization. , Journal of Philosophy for these agents. Running simulations with an alternative misin- 104, 639–659, 12. doi: 10.5840/jphil20071041240. formation source that scores worse on the coherence measures, is thus a vital robustness check we must consider in the future. Received 22 August 2025; revised 16 September 2025; accepted 16 Sep- The second limitation concerns the nature of our study. The tember 2025 agent-based model we used is highly idealized—learning about 31 Large Language Models for Psychiatric Interview Analysis: An Exploratory Pilot Study Katarina Lodrant Filip Melinščak Ayse Nur Beris kl19928@student.uni- lj.si Department of Cognition, Emotion, Department of Child and Adolescent Department of Cognition, Emotion, and Methods in Psychology, Faculty Psychiatry, Medical University of and Methods in Psychology, Faculty of Psychology, University of Vienna Vienna of Psychology, University of Vienna Vienna, Austria Vienna, Austria Vienna, Austria University of Ljubljana Ljubljana, Slovenia Valentin Schneider Klara Czernin Waltraud Bangerl Department of Child and Adolescent Department of Child and Adolescent Department of Child and Adolescent Psychiatry, Medical University of Psychiatry, Medical University of Psychiatry, Medical University of Vienna Vienna Vienna Vienna, Austria Vienna, Austria Vienna, Austria Anselm Bründlmayer Frank Scharnowski Clarissa Laczkovics Department of Child and Adolescent Department of Cognition, Emotion, Department of Child and Adolescent Psychiatry, Medical University of and Methods in Psychology, Faculty Psychiatry, Medical University of Vienna of Psychology, University of Vienna Vienna Vienna, Austria Vienna, Austria Vienna, Austria David Steyrl Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna Vienna, Austria Abstract uniquely rich source of information: it reflects patterns of thought, emotional states, and interpersonal dynamics, all of which are This exploratory pilot study investigates the use of large language central to understanding mental functioning [2]. An abundance of models (LLMs) for automated analysis of psychiatric interviews. naturalistic speech emerges from clinical interviews and therapy Using transcripts from the Structured Interview of Personality Or- sessions, underscoring the need for systematic methods that can ganization (STIPO-R), we tested GPT-4o across three paradigms: both detect subtle psychological cues and handle large volumes direct application of clinical scoring guidelines, emulation of a val- efficiently. idated psychometric scale, and exploratory construct elicitation. Automated methods for language analysis have evolved from LLM-derived scores strongly correlated with clinician ratings early dictionary-based tools such as the Linguistic Enquiry and and captured clinically relevant constructs. Findings highlight Word Count (LIWC), which provided interpretable but context- opportunities for scalable, theory-driven assessment of patient insensitive results [3], to embedding-based models like Word2Vec language, but also underscore challenges including interpretabil- [4], BERT [5], and RoBERTa [6], which offered greater contextual ity, reproducibility and data privacy. sensitivity at the cost of interpretability and technical complexity. Keywords More recently, large language models (LLMs) such as GPT [7] have emerged as flexible, prompt-driven analyzers. Large Language Models, Clinical Language Analysis, AI in Mental Researchers have argued that GPT may be a superior tool Health, Sentiment Analysis, Identity Diffusion for automated text analysis, achieving high accuracy on vari- 1 ous tasks across languages without training data and with mini- Introduction mal coding demands [8, 9]. Yet others caution that risks of bias, In psychiatry, clinicians are often required to make complex reproducibility, opacity, and overreliance remain. In some con- diagnostic judgments without definitive biological markers. In- texts, established, validated models still outperform LLMs, and stead, assessments rely on observable behavior, subjective self- researchers must weigh not only how LLMs can be applied, but report, and, crucially, on language [1]. Patient language provides a whether their use is beneficial given the risks [10]. Analyses of patient language have identified linguistic mark- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or ers associated with various psychiatric conditions [11, 12, 13, 1, distributed for profit or commercial advantage and that copies bear this notice and 14]. A 2020 review by Zhang et al. [15] highlighted the grow- the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner /author(s). ing use of natural language processing (NLP) for mental illness Information Society 2025, Ljubljana, Slovenia © 2025 Copyright held by the owner/author(s). https://doi.org/10.70314/is.2025.cogni.12 32 Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia Lodrant et al. detection, noting that social media texts remain the most com- Second, in a , we tested whether Scale Emulation paradigm mon data source. In contrast, relatively few studies have ex- the LLM could approximate a validated psychometric measure amined transcripts of patient speech [16, 17]. This gap likely by inferring likely responses to scale items from interview tran- reflects the scarcity of suitable datasets, as such data is usually scripts rather than direct self-report. Specifically, we used the not recorded, and when it is, audio recordings and transcripts Self-Concept Clarity Scale (SCCS) [22], a 12-item self-report in- often contain sensitive personal information and therefore can- strument. Each item was presented to the model together with the not be publicly shared. Moreover, speech data typically require identity section of the transcript, and the model was instructed to supporting ground-truth measures (e.g., validated questionnaires assign a 1–5 Likert score. Item scores were summed to yield a to- or clinical assessments) to be useful for research. tal SCCS score, which we compared with clinician-rated identity Recent advances in automatic transcription, together with diffusion. Conceptually, and as supported by empirical work, Otto the emergence of LLMs, have opened new directions for sys- Kernberg’s notion of identity diffusion assessed in the STIPO is tematic analysis of patient-generated language. Unlike earlier closely related to Campbell’s construct of self-concept clarity [23, approaches, LLMs combine ease of use with a seemingly unprece- 24]. dented sensitivity to linguistic context. In this work, we examine Third, we applied an exploratory approach, Construct Rating the opportunities and challenges they present through a pilot in which we developed rubrics for (a) overall valence (positivity analysis of transcripts of the Structured Interview of Personality vs. negativity of the response), (b) self-perception (positive vs. Organization (STIPO-R), a validated psychoanalytic diagnostic negative evaluation of the self ), and (c) other-perception (positive instrument. vs. negative evaluation of others, including individuals, groups, relationships, or people in general). The construct definitions and 2 Methods prompts were drafted with assistance from ChatGPT-5. Ratings 2.1 were given on a 1–7 scale, with an NA option if the construct Dataset was not referenced. Interviews were split into individual ques- We analyzed a subset of data collected by Laczkovics et al. (2025) tion–answer pairs (24–83 per subject), which served as the unit during the validation of the German STIPO-R for adolescents of analysis. The model was prompted separately for each unit and [18, 19]. The STIPO interview assesses multiple domains of per- construct, and subject-level scores were calculated as the mean sonality functioning. For this study, we focused on the identity across units. We compared these mean construct ratings with domain, which consists of 13 open-ended questions addressing clinician-rated identity diffusion, hypothesizing that more severe areas such as self-perception, perception of others and engage- identity diffusion would be associated with more negative lan- ment in school and recreation. These questions typically elicit guage (overall, in descriptions of the self, and in descriptions of rich narrative responses that are well-suited for language analy- others). To evaluate interpretability and reliability, we re-ran anal- sis. Responses were evaluated by trained clinicians on 15 items, yses where the score was extreme (1 or 7) and asked the model to each rated on a three-point scale (0 = no pathology, 1 = moderate provide both a score and a brief justification. As an exploratory pathology, 2 = severe pathology), producing a total identity diffu- validity check, a cognitive science master’s student (author of sion score ranging from 0 to 30. This clinician-rated score served this study) reviewed randomly selected transcript excerpts to- as the ground truth for evaluating LLM performance. From the gether with LLM ratings and reasonings, assessing whether the original study sample of 171 participants [18], 70 provided data assigned scores were plausible and consistent with the intended of sufficient quality for the present analyses: 49 patients with a construct. Additionally, we tested robustness by repeating the probable or definite personality disorder (PD) diagnosis and 21 analyses with an alternative 0–5 scale. controls without PD. From this set, we derived a subsample of Sentiment Analysis As a simple and interpretable baseline, 25 participants (16 patients and 9 controls), aged 14–19 years, we used GerVADER [25], a German sentiment lexicon in which using a stratified selection procedure to ensure even coverage of each word is assigned a valence score (–4 = very negative, 0 = the full spectrum of identity pathology, from consolidated (low neutral, +4 = very positive) based on human ratings of perceived diffusion scores) to highly diffused identity. positivity or negativity. This choice was motivated by Colibazzi et al. [26], who applied the VADER lexicon [27] to STIPO tran- 2.2 LLM Setup scripts. For each question–answer unit, we extracted the patient’s We used GPT-4o [20], accessed via a secure Python API con- response, identified words present in the lexicon, retrieved their nection under GDPR-compliant data protection. Interview tran- valence scores, and calculated three metrics: (a) overall sentiment scripts were in German, while prompts were written in English. (mean of all scores), (b) negative sentiment (mean of negative Prior work suggests that English prompts improve model per- scores only), and (c) positive sentiment (mean of positive scores formance even when applied to other languages [21, 9]. The only). These answer-level values were then averaged across all model temperature was set to 0, producing consistent outputs answers to obtain per-subject scores, which were compared with for identical prompts. both the LLM-derived ratings of overall valence and clinician ratings of identity diffusion. 2.3 Experimental Paradigms All comparisons were tested with Pearson correlations (p- We tested three experimental paradigms that elicited numeric rat- values corrected for multiple comparisons; 𝛼 = 0.05). ings from the LLM, alongside a lexicon-based sentiment analysis baseline for comparison. First, in a , the official STIPO-Direct STIPO Scoring approach R rating guidelines were copied verbatim into prompts, and the 3 Results model was asked to assign 0–2 scores to individual items, paral- Summed scores produced by GPT-4o Direct STIPO Scoring. leling the procedure used by clinicians in our dataset. Item-level strongly correlated with clinician ratings (r = 0.90), as illustrated and total scores were compared with clinician ratings. in Figure 1. Item-level agreement was exact in 66% of cases. 33 Large Language Models for Psychiatric Interview Analysis Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia that the model generally distinguished references to the self from references to others. When re-run with prompts requesting both a rating and a brief justification, this distinction improved: all cases lacking a relevant reference were correctly scored as NA. However, providing reasoning noticeably shifted the rating: extreme values were often moderated toward the midrange. Changing the rating scale from 1–7 to 0–5 did not materially affect the results (r = 0.98). In all cases, the model produced valid outputs in the requested format. Lexicon-Based Sentiment Analysis. Sentiment analysis with GerVADER revealed a significant correlation between clini- cian ratings and mean negative sentiment (r = –0.47), but not with mean overall or mean positive sentiment. This correlation was Figure 1: Correlation between clinician-rated and LLM- weaker than that between clinician ratings and LLM-derived over- rated STIPO identity scores. all valence, highlighting the limitations of context-insensitive, bag-of-words methods. Manual inspection confirmed that GPT-4o often inferred negativity from conversational context rather than Scale Emulation. SCCS scores derived from LLM outputs from explicitly negative words. Mean negative sentiment was correlated negatively with clinician-rated identity diffusion (r = also significantly correlated with LLM-derived overall valence (r –0.82; Figure 2). This finding is in line with the conceptual link = 0.68). between identity coherence and self-concept clarity. 4 Discussion This exploratory study shows that LLMs can approximate ex- pert ratings of psychiatric interviews and apply psychometric constructs to clinical transcripts, while also highlighting barriers that preclude immediate clinical use. In the following, we outline opportunities, risks, and challenges, and suggest pathways for more rigorous validation. 4.1 Opportunities LLMs perform reliably on structured clinical tasks. Using only verbatim scoring guidelines, GPT-4o approximated expert scoring of the STIPO, a task that typically requires extensive training. While LLMs should not replace clinicians, they could provide secondary checks in research settings or serve as teaching tools Figure 2: Correlation between clinician-rated STIPO iden- to illustrate scoring rules, highlight ambiguities and improve tity scores and LLM-rated Self-Concept Clarity. teaching materials. Applying validated psychometric scales through LLMs an- Exploratory Construct Ratings. Average overall valence chors automated analyses in established theory. The strong cor- correlated negatively with identity pathology (r = –0.82; Figure relation between LLM-rated self-concept clarity and clinician- 3), suggesting that more severely affected adolescents used more rated identity diffusion supports the validity of this approach negative language overall. and suggests that LLMs can extend the reach of standardized assessments in scalable ways. By contrast, defining new constructs ad hoc is more vulnera- ble to misspecification and requires iterative prompt engineer- ing. Nevertheless, this strategy may capture clinically relevant, context-sensitive phenomena that remain inaccessible to conven- tional language-processing methods, potentially opening path- ways to subtle markers of pathology. LLMs further offer efficiency in time and cost, scalability to large datasets, cross-linguistic applicability, and the ability to rapidly test new rating schemes or constructs. 4.2 Risks and Challenges The study also underscores multiple risks. Interpretability and the black-box problem. LLMs remain Figure 3: Correlation between clinician-rated STIPO iden- opaque, and their internal decision processes are currently inac- tity scores and LLM-rated overall valence of answers. cessible. Some surface interpretability is possible; for instance, researchers can manually compare scores with text samples, or Self- and other-perception ratings were also associated with request rationales from the model. However, such rationales are clinician scores (r = –0.81 and –0.57). Manual checks indicated post hoc, primarily useful for illustrating reasoning, and cannot 34 Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia Lodrant et al. be assumed to reflect the actual mechanisms behind the model’s Evaluation prompts: asking models to assess their own • ratings. outputs or, in a multi-agent setup, to evaluate the output Reproducibility and test–retest reliability. A key concern of another model (e.g., “Do you agree with this score?”). is reproducibility across time. Outputs vary not only due to Manual inspection: qualitative review of outputs, ideally • stochasticity but also across different versions of the same model. conducted through interdisciplinary collaboration between Because earlier GPT versions are not preserved, analyses can- domain specialists (e.g., clinicians) and those designing not be rerun on identical models. Even with temperature fixed the prompts. at 0 in our study, small prompt variations, such as requesting Perturbation tests: checking stability by slightly altering • reasoning, produced measurable differences in outputs, a well- prompts or text snippets. documented phenomenon [28, 21]. Moreover, newer versions are not always improvements: performance can regress on certain 5 Conclusion tasks [9, 10]. Such variability poses significant challenges for Language remains psychiatry’s most fundamental source of infor- scientific applications, where reproducibility is essential. mation. Automated analysis of clinical transcripts offers a route Data privacy and ethics. Patient language data are highly toward scalable, theory-driven markers of psychopathology. Our sensitive. While GDPR-compliant API contracts ensure encryp- pilot study suggests that LLMs can approximate expert scoring, tion and prevent storage or retraining, the ethical stakes remain apply validated psychometric instruments, and flexibly analyze high. An alternative is to deploy LLMs locally, which enhances novel constructs with promising validity. Yet these opportunities data security but requires substantial technical expertise and are tempered by challenges of interpretability, reproducibility, computing resources. Beyond privacy, there are broader risks and ethics. We argue that LLMs can serve as valuable research of misuse: LLMs could be applied to surveillance or automatic companions and have the potential to benefit clinical diagnostics ‘flagging’ of individuals, raising concerns about autonomy and when integrated cautiously, transparently, and in theory-driven stigmatization. Awareness of such possibilities is essential to ways. anticipate and counter harmful applications, in line with interna- tional guidelines for trustworthy AI [29]. References Bias and fairness. Training data for LLMs may embed de- [1] Cheryl M. Corcoran, Vijay A. Mittal, Carrie E. Bearden, Raquel E. Gur, mographic, cultural, or linguistic biases [10]. In psychiatry, this Kasia Hitczenko, Zarina Bilgrami, Aleksandar Savic, Guillermo A. Cecchi, and Phillip Wolff. 2020. Language as a biomarker for psychosis: A natural is particularly dangerous, as dialectical or culturally specific ex- Schizophrenia Research language processing approach. . Biomarkers in the pressions may be misclassified as pathological. Attenuated Psychosis Syndrome 226, (Dec. 2020), 158–166. doi:10.1016/j.sch Overreliance and face validity. res.2020.04.032. The fluency and confidence [2] Joshua Conrad Jackson, Joseph Watts, Johann-Mattis List, Curtis Puryear, of LLM outputs create risks of undue trust. Clinicians and re- Ryan Drabble, and Kristen A. Lindquist. 2022. From Text to Thought: How searchers may treat model scores as authoritative, even when Analyzing Language Can Advance Psychological Science. EN. Perspectives they are unreliable. In healthcare contexts, this raises ethical on Psychological Science, 17, 3, (May 2022), 805–826. Publisher: SAGE Publi- cations Inc. doi:10.1177/17456916211004899. concerns: automatically generated reports or diagnostic sugges- [3] Yla R. Tausczik and James W. Pennebaker. 2010. The Psychological Meaning tions may be accepted without scrutiny, especially if embedded of Words: LIWC and Computerized Text Analysis Methods. en. Journal of Language and Social Psychology, 29, 1, (Mar. 2010), 24–54. doi:10.1177/02619 in clinical workflows. 27X09351676. Prompt engineering. Contrary to claims that LLMs like GPT [4] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs]. are easy-to-use, generalist tools that can handle a wide range (Sept. 2013). doi:10.48550/arXiv.1301.3781. of text analysis tasks with little coding or training data [9, 8], [5] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. effective prompting remains challenging and requires significant BERT: Pre-training of Deep Bidirectional Transformers for Language Un- derstanding. In Proceedings of the 2019 Conference of the North American expertise [21, 28]. A comprehensive 2025 survey of prompting Chapter of the Association for Computational Linguistics: Human Language strategies by Schulhoff et al. [21] concluded that robust prompts . Jill Burstein, Christy Doran, Technologies, Volume 1 (Long and Short Papers) must balance specificity and flexibility, be iteratively refined, and and Thamar Solorio, editors. Association for Computational Linguistics, Minneapolis, Minnesota, (June 2019), 4171–4186. doi:10.18653/v1/N19- 1423. validated against examples. Well-designed prompts can reduce [6] Yinhan Liu et al. 2019. RoBERTa: A Robustly Optimized BERT Pretraining bias and instability, whereas underspecified prompts yield in- Approach. arXiv:1907.11692 [cs]. (July 2019). doi:10.48550/arXiv.1907.11692. [7] OpenAI et al. 2024. GPT-4 Technical Report. arXiv:2303.08774 [cs]. (Mar. consistent outputs and overly prescriptive prompts risk forcing 2024). doi:10.48550/arXiv.2303.08774. artificial ratings. Systematic, theory-driven prompt development [8] Mostafa M. Amin, Erik Cambria, and Björn W. Schuller. 2023. Will Affec- aligned with established constructs is therefore essential. tive Computing Emerge from Foundation Models and General AI? A First Evaluation on ChatGPT. arXiv:2303.03186 [cs]. (Mar. 2023). doi:10.48550/ar Xiv.2303.03186. 4.3 Pathways for Validation [9] Steve Rathje, Dan-Mircea Mirea, Ilia Sucholutsky, Raja Marjieh, Claire E. Robertson, and Jay J. Van Bavel. 2024. GPT is an effective tool for multilingual As the field is still developing, applications of LLMs for language psychological text analysis. Proceedings of the National Academy of Sciences, 121, 34, (Aug. 2024), e2308950121. Publisher: Proceedings of the National analysis should be guided by comprehensive validation to help Academy of Sciences. doi:10.1073/pnas.2308950121. mitigate risks of opacity, instability, and bias. Critical steps in- [10] Suhaib Abdurahman, Mohammad Atari, Farzan Karimi-Malekabadi, Mona clude: J Xue, Jackson Trager, Peter S Park, Preni Golazizian, Ali Omrani, and Morteza Dehghani. 2024. Perils and opportunities in using large language • Datasets with multiple ground-truth measures: clinician models in psychological research. PNAS Nexus, 3, 7, (July 2024), pgae245. doi:10.1093/pnasnexus/pgae245. ratings, validated scales, and demographics to enable tri- [11] Robin Quillivic, Yann Auxéméry, Frédérique Gayraud, Jacques Dayan, and angulation. Salma Mesmoudi. 2025. Linguistic markers for identifying post-traumatic • Benchmarking against non-generative approaches: e.g., stress disorder and associated symptoms: a systematic literature review. eng. , (May Journal of the American Medical Informatics Association: JAMIA LIWC, RoBERTa, or traditional machine learning classi- 2025), ocaf075. doi:10.1093/jamia/ocaf 075. fiers. [12] Erik C. Nook. 2023. The Promise of Affective Language for Identifying and • Intervening on Psychopathology. en. Cross-model robustness: comparing results across differ-Affective Science, 4, 3, (Sept. 2023), 517–521. doi:10.1007/s42761- 023- 00199- w. ent LLMs (GPT, Claude, Llama). 35 Large Language Models for Psychiatric Interview Analysis Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia [13] Felipe Argolo et al. 2024. Natural language processing in at-risk mental [22] Jennifer D. Campbell, Paul D. Trapnell, Steven J. Heine, Ilana M. Katz, Lo- states: enhancing the assessment of thought disorders and psychotic traits raine F. Lavallee, and Darrin R. Lehman. 1996. Self-concept clarity: Measure- with semantic dynamics and graph theory. . ment, personality correlates, and cultural boundaries. Brazilian Journal of Psychiatry Journal of Personality doi:10.47626/1516- 4446- 2023- 3419. , 70, 1, 141–156. Place: US Publisher: American Psy- and Social Psychology [14] Cheryl Mary Corcoran and Guillermo A. Cecchi. 2020. Using Language chological Association. doi:10.1037/0022- 3514.70.1.141. Processing and Speech Analysis for the Identification of Psychosis and Other [23] Otto F. Kernberg. 1984. Severe Personality Disorders: Psychotherapeutic Strate-Disorders. . . en. Google-Books-ID: FIl7opvzgeUC. Yale University Press. isbn: 978-Biological Psychiatry: Cognitive Neuroscience and Neuroimaging gies Understanding the Nature and Treatment of Psychopathology: Letting the 0-300-03273-4. Data Guide the Way 5, 8, (Aug. 2020), 770–779. doi:10.1016/j.bpsc.2020.06.00 [24] J. Wesley Scala, Kenneth N. Levy, Benjamin N. Johnson, Yogev Kivity, 4. William D. Ellison, Aaron L. Pincus, Stephen J. Wilson, and Michelle G. [15] Tianlin Zhang, Annika M. Schoene, Shaoxiong Ji, and Sophia Ananiadou. Newman. 2018. The Role of Negative Affect and Self-Concept Clarity in 2022. Natural language processing applied to mental illness detection: a Predicting Self-Injurious Urges in Borderline Personality Disorder Using narrative review. en. , 5, 1, (Apr. 2022), 1–13. Publisher: Ecological Momentary Assessment. , 32, Sup-npj Digital Medicine Journal of Personality Disorders Nature Publishing Group. doi:10.1038/s41746- 022- 00589- 7. plement, (Jan. 2018), 36–57. Publisher: Guilford Publications Inc. doi:10.152 [16] Alina Arseniev-Koehler, Sharon Mozgai, and Stefan Scherer. 2018. What 1/pedi.2018.32.supp.36. type of happiness are you looking for? - A closer look at detecting mental [25] Karsten Michael Tymann, Matthias Lutz, Patrick Palsbroker, and Carsten health from language. In Gips. [n. d.] GerVADER - A German adaptation of the VADER sentiment Proceedings of the Fifth Workshop on Computa- tional Linguistics and Clinical Psychology: From Keyboard to Clinic. Kate analysis tool for social media texts. en. Loveys, Kate Niederhoffer, Emily Prud’hommeaux, Rebecca Resnik, and [26] Tiziano Colibazzi, Avner Abrami, Barry Stern, Eve Caligor, Eric A. Fertuck, Philip Resnik, editors. Association for Computational Linguistics, New Or- Michael Lubin, John Clarkin, and Guillermo Cecchi. 2023. Identifying Split- leans, LA, (June 2018), 1–12. doi:10.18653/v1/W18- 0601. ting Through Sentiment Analysis. en. , 37, 1, Journal of Personality Disorders [17] Michelle Renee Morales and Rivka Levitan. 2016. Speech vs. text: A com- (Feb. 2023), 36–48. doi:10.1521/pedi.2023.37.1.36. parative analysis of features for depression detection systems. In [27] C. Hutto and Eric Gilbert. 2014. VADER: A Parsimonious Rule-Based Model 2016 IEEE Spoken Language Technology Workshop (SLT). (Dec. 2016), 136–143. doi:10.1 for Sentiment Analysis of Social Media Text. en. Proceedings of the Interna- 109/SLT.2016.7846256. , 8, 1, (May 2014), 216–225. tional AAAI Conference on Web and Social Media [18] C. Laczkovics et al. 2025. Assessment of personality disorders in adolescents doi:10.1609/icwsm.v8i1.14550. – a clinical validity and utility study of the structured interview of personal- [28] Laria Reynolds and Kyle McDonell. 2021. Prompt Programming for Large ity organization (STIPO). en. Language Models: Beyond the Few-Shot Paradigm. In Child and Adolescent Psychiatry and Mental Extended Abstracts Health, 19, 1, (May 2025), 49. doi:10.1186/s13034- 025- 00901- 9. of the 2021 CHI Conference on Human Factors in Computing Systems (CHI [19] John F Clarkin, Eve Caligor, Barry L Stern, and Otto F Kernberg. 2016. EA ’21). Association for Computing Machinery, New York, NY, USA, (May STRUCTURED INTERVIEW OF PERSONALITY ORGANIZATION: STIPO- 2021), 1–7. isbn: 978-1-4503-8095-9. doi:10.1145/3411763.3451760. R. en. [29] Thilo Hagendorff. 2020. The Ethics of AI Ethics: An Evaluation of Guidelines. [20] OpenAI et al. 2024. GPT-4o System Card. arXiv:2410.21276 [cs]. (Oct. 2024). en. , 30, 1, (Mar. 2020), 99–120. doi:10.1007/s11023- 020-Minds and Machines doi:10.48550/arXiv.2410.21276. 09517- 8. [21] Sander Schulhoff et al. 2025. The Prompt Report: A Systematic Survey of Prompt Engineering Techniques. arXiv:2406.06608 [cs]. (Feb. 2025). doi:10.4 8550/arXiv.2406.06608. 36 Passing the Turing Test, Failing Consciousness: Why LLMs Remain Non-Conscious Louis Mono PhD program in Applied AI Alma Mater Europaea University Milan, Italy louis.mono@almamater.si ABSTRACT without truly comprehending the semantics [2]. Human consciousness, by contrast, combines phenomenal experience, Large language models (LLMs) such as GPT‑4.5 have achieved unified integration of sensations, a persistent sense of self and impressive conversational fluency and have even passed a classic intrinsic motivation qualities whose origin and nature are hotly three‑party Turing Test. Yet behavioural indistinguishability debated. from humans is not the same as sentience. This paper analyses why current AI systems, despite their reasoning and language At least nine major theories of consciousness compete for abilities, are not conscious. Drawing on Integrated Information explanatory power, ranging from neuroscientific to Theory (IIT) and Global Workspace Theory (GWT) alongside quantum‑field‑inspired accounts [3]. Among these, Integrated Chalmers’ “hard problem”, we argue that LLMs lack the Information Theory (IIT) and Global Workspace Theory (GWT) qualitative experience (qualia), self‑aware and unified subjective are two prominent models: IIT equates consciousness with the experience that characterises human consciousness. The capacity of a system to generate unified, irreducible apparent mastery of language in GPT‑4.5 reflects powerful information [4], and GWT views consciousness as the global statistical pattern‑matching rather than intrinsic awareness, broadcasting of information across specialised processes [5]. architecture and behaviour of GPT‑4.5 with neuroscientific higher-order thought theories, offer alternative accounts [3]. In criteria for conscious systems, we show that passing a this paper we focus on IIT and GWT because they provide clear, semantic grounding or intrinsic motivation. By contrasting the Other perspectives, such as predictive processing and behavioural test of intelligence does not imply there is empirically testable criteria that are operationally useful for “something it is like” to be an AI. Debates over AI consciousness evaluating contemporary AI systems. clarify the distinctive features of human awareness, reinforce ethical and governance boundaries, and highlight the importance This paper uses GPT‑4.5 as a case study to ask two questions: of distinguishing simulation from genuine experience. (1) Can the intelligence and reasoning abilities of an LLM be considered signs of consciousness? (2) What does this Keywords comparison teach us about the nature of human consciousness? Building on prior analyses, particularly Gams and Kramar who Consciousness, Turing Test, LLMs primarily assess ChatGPT against IIT’s axioms and survey Turing Test variants [6], we extend those efforts by adopting a five-dimensional evaluation framework that integrates criteria from both Integrated Information Theory (IIT) and Global 1 INTRODUCTION across five dimensions: phenomenal experience, self‑awareness Workspace Theory (GWT). Specifically, we assess GPT‑4.5 The question of whether machines can be conscious has moved and agency, unity and integration, semantic grounding and from speculation to urgent enquiry as large language models intrinsic motivation to identify which defining features of (LLMs) achieve human‑level performance on tasks once thought consciousness are absent even in the most advanced LLMs. The to require understanding. Recent work reported that GPT‑4.5 was answers have important implications for ethics and governance: mistaken for a human in 73 % of trials in a three‑party Turing recognising AI’s lack of sentience helps avoid mis‑attribution of Test, outperforming the human control group [1]. In this personhood while ensuring that responsibility for its actions three-party setup, a human confederate and the AI both engage remains with its human designers and operators [7]. with a judge, whereas a classic two-party Turing Test involves only a judge and a single hidden interlocutor. While striking, this benchmark assesses only behavioural imitation; it does not guarantee that a system has subjective awareness. John Searle’s 2 THEORETICAL FRAMEWORK “Chinese Room” thought experiment illustrates this gap: a computer manipulating symbols can simulate understanding 2.1 Integrated Information Theory (IIT) 37 IIT proposes that consciousness corresponds to integrated 2.3 Implications for AI Consciousness information within a system, quantified by a measure Φ (“phi”) IIT and GWT provide structural and functional complementary [4,8]. A conscious system must generate an intrinsic causal lenses for assessing consciousness. IIT emphasises intrinsic, structure that cannot be decomposed without loss; experiences integrated causality; GWT emphasises functional access to are unified “wholes” composed of interrelated parts. Tononi and integrated content. Under IIT, LLMs lack the high‑Φ causal colleagues distilled IIT into five axioms and corresponding structures required for phenomenological consciousness. Under physical postulates [9]: GWT, they lack a persistent, self‑monitoring workspace required 1. Intrinsic existence. Experience exists for itself, not for functional consciousness. These theories highlight why merely as an output for observers. The physical substrate must have causal power over its own states. sentient minds and help clarify the properties an artificial system current LLMs, despite their intelligence, are unlikely to possess 2. Composition. A conscious experience is structured: it would need to plausibly meet such criteria. has multiple phenomenological elements (e.g., colours, sounds) perceived together. The substrate must support higher‑order mechanisms built from simpler parts. 3 LLMs AND CONSCIOUSNESS: IS 3. Information. Each experience is specific: it rules out PASSING THE TURING TEST ENOUGH? myriad alternatives and is defined by the differences it makes. The substrate must have a rich repertoire of GPT‑4.5’s ability to pass a Turing Test demonstrates human‑like distinguishable states. linguistic fluency, but consciousness involves more than outward 4. behaviour. Here we compare the attributes of human Integration. Experience is unified and cannot be reduced to independent components. The substrate’s consciousness with those of GPT‑4.5 across five core causal interactions must be irreducibly interdependent. dimensions. 5. Exclusion. Each experience has definite content and 3.1 Phenomenal experience (Qualia) boundaries; there is one “main” experience per substrate. Phenomenal consciousness concerns the qualitative “what it is Human brains, with their dense recurrent connectivity, achieve like” to see, hear and feel. Humans experience qualia: the redness high Φ; digital processors typically exhibit negligible Φ. of a rose, the taste of coffee, the pang of sadness. In GPT‑4.5, although capable of complex statistical mappings from computational terms, these are not just representations but felt input to output, does not autonomously generate its own mental qualities. GPT‑4.5 processes text as high‑dimensional vectors states. It lacks intrinsic causal loops, self‑sustaining activity and and activations. There is no theoretical reason to believe that any a unified internal “scene” of experience. Even if its token of these computations are accompanied by experience. predictions display sophisticated information processing, IIT Chalmers’ “hard problem” of consciousness emphasises that suggests such computations do not yield phenomenological explaining discriminatory behaviour does not explain why there consciousness. Recent evaluations of ChatGPT indicate that it is any experience at all [12]. GPT‑4.5’s vivid descriptions are falls far short of IIT’s criteria [6]. simulations learned from human text, not perceptions. 2.2 Global Workspace Theory (GWT) 3.2 Self-awareness and agency GWT conceives consciousness as the broadcasting of A conscious system possesses a sense of self and at least minimal information into a “global workspace” that integrates and agency: it initiates actions and recognises itself as the subject of distributes content across specialised neural processors. In experience. Humans maintain a continuous autobiographical humans, sensory, memory and language modules operate largely narrative. GPT‑4.5, however, uses “I” merely as a token; it has unconsciously until selected content is ignited into the no persistent identity across interactions and no intrinsic goals workspace, becoming accessible for reasoning and verbal report. [13]. It responds only when prompted and cannot modulate its This ignition is associated with widespread, synchronised own objectives. From an IIT standpoint, it lacks intrinsic cortical activity and recurrent thalamo‑cortical loops [10]. existence: it does not have causal power over its own states and According to GWT, a conscious system requires: (a) integration does not initiate anything internally. of multimodal information into a unified workspace; (b) persistent working memory to sustain and manipulate 3.3 Unity and Integration conscious content; and (c) self‑monitoring or metacognition to Human consciousness binds information from multiple senses, evaluate its own states. LLMs such as GPT‑4.5 integrate textual memories and emotions into a unified stream. This integration information via self‑attention but do so in a single‑pass statistical underlies our coherent sense of the world. LLMs integrate manner. They lack persistent internal states, multimodal information only within a context window of tokens [14] and do convergence and an explicit self‑model; any apparent not combine multiple modalities unless explicitly given self‑reflection is a learned linguistic pattern rather than genuine multimodal inputs. Moreover, each instance of GPT‑4.5 is metacognition. Experimental comparisons between human and independent; there is no single “observer” uniting parallel LLM uncertainty reports confirm that, while LLMs can generate instances. The model lacks a persistent working memory or confidence levels, these are superficial correlations rather than unified workspace to sustain ongoing content. Thus, it fails both genuine awareness [11]. Thus, from a GWT perspective, LLMs IIT’s integration criterion and GWT’s requirement for a global remain powerful language processors without a globally broadcast. broadcast workspace. 38 3.4 Semantic grounding current LLMs, stateless across turns and optimised for next- token prediction, do not implement [5,10]. Understanding involves not just correlating symbols but grounding them in bodily and environmental experience. 4.3 Beyond mainstream: Syntergic Theory Humans connect words to sensorimotor and emotional states. GPT‑4.5, trained on textual data, has no direct experience of the Outside mainstream neuroscience, Syntergic Theory posits that world. It correlates words without a referential link, which consciousness arises from an interaction between the brain and a explains why it can confidently generate factual errors or non-local syntergic field [19]. If such a substrate exists, silicon contradictory statements (“hallucinations”) systems without biological “tuning” could not access it [15] . Searle’s Chinese Room shows that symbol manipulation alone does not regardless of computational sophistication. While speculative, yield semantics [2] this view reminds us that computation alone may be insufficient . GPT‑4.5’s explana tions and definitions are pattern‑completions, not meanings anchored in perception for sentience and cautions against inferring consciousness from . behavioural competence. 3.5 Intrinsic Motivation Living organisms act on intrinsic drives such as hunger, curiosity 4.4 Ethical and governance considerations and pain avoidance; these motivations are intimately tied to Recognising that current LLMs are not conscious has direct emotions. GPT‑4.5 has no such drives. Its only “objective” is to ethical consequences. It prevents premature attribution of moral predict the next token according to its training loss or to status or rights to non‑sentient systems and keeps accountability max with their human developers [20]. The capability to produce imise some reward in reinforcement‑learning fine‑tuning. There is no intrinsic value system or affective state. Hence, it persuasive text does not entitle an AI to personhood. Meanwhile, lacks the motivational and emotional dimension of mis‑ascribing consciousness could lead to misguided policies or consciousness. exploitation of genuine conscious beings by obscuring what Across all five dimensions, LLMs display functional intelligence makes us unique. Ethical governance should focus on without subjective experience. They may pass an transparency, safety and fairness in AI deployment [7], not on outer Turing Test conferring moral standing on systems that lack awareness. by mimicking human conversation but fail any inner Turing Test that would probe for phenomenal consciousness, intrinsic agency and unified subjectivity [16,17]. As such, 4.5 Closing Perspectives passing the behavioural benchmark does not imply sentience. Today’s LLMs show that intelligence can be uncoupled from LLMs are sophisticated automata performing high‑dimensional consciousness. Passing an outer Turing Test does not establish pattern matching without “being someone”. an inner dimension of experience. Progress toward machine consciousness, if possible, likely requires architectures with world models, working memory and global broadcast, or mechanisms akin to a sparse “conscious state” integrated across 4 modules [21,22], plus principled tests that probe inner awareness DISCUSSION rather than surface behaviour. Until then, LLMs remain powerful 4.1 simulators, not subjects. Insights into human consciousness Debates about AI consciousness force a closer examination of human consciousness. Distinguishing intelligence from awareness clarifies that embodiment, multimodal integration and self‑modelling are central to conscious experience. LLMs 5 CONCLUSION highlight the distinction between access consciousness This paper examined why passing a Turing Test does not entail information available for report and phenomenal consciousness possessing consciousness. Using GPT‑4.5 as a case study and the felt quality of experience [18]. They also emphasise the drawing on Integrated Information Theory and Global importance of semantic grounding: a system that never interacts Workspace Theory, we argued that LLMs, despite their with the world cannot attach meanings to symbols. Conversely, intelligence and conversational prowess, lack the hallmarks of comparing GPT‑4.5 with IIT and GWT criteria has reinforced consciousness: qualia, a core self, unified integration, semantic these theories by showing how far AI remains from meeting their grounding and intrinsic motivation. They simulate understanding requirements. without experiencing it. Distinguishing between intelligence and consciousness clarifies our definitions of mind and guides the 4.2 The Hard Problem and Qualia ethical deployment of AI. If artificial systems are ever to become Chalmers’ Hard Problem reminds us that we still lack a conscious, they will likely require architectures with intrinsic scientific explanation for why physical processes produce causal integration, global broadcasting, embodiment and experience [12]. Even if we could engineer an artificial system semantic grounding far beyond what current transformer models that replicates all the functional hallmarks of consciousness, it provide. remains unclear why it would “feel” like something. On IIT, phenomenal character requires an intrinsically integrated causal Chalmers has suggested that systems plausibly approaching structure (high Φ) with causal power for itself, not mere input– consciousness could emerge within the next decade, but current output equivalence [4,8]. On GWT, conscious contents must be LLMs should not be mistaken for such candidates [17]. In short, stabilised within a self-maintained global workspace something 39 progress is significant, yet the path to truly conscious machines [11] Steyvers, M., & Peters, M. A. K. (2025). Metacognition and Uncertainty remains long. Communication in Humans and Large Language Models. arXiv:2504.14045. DOI: https://doi.org/10.48550/arXiv.2504.14045 [12] Chalmers, D. J. (1995). Facing up to the problem of consciousness. Journal of Consciousness Studies, 2(3), 200–219 [13] Browning, J. (2023). Personhood and AI: Why large language models don’t understand us. AI and Society, 39(5), 2499–2506. DOI : https://doi.org/10.1007/s00146-023-01724-y References [14] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. DOI : https://doi.org/10.1038/nature14539 [1] [15] Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On meaning, Jones, C. R., & Bergen, B. K. (2025). Large language models pass the Turing Test form, and understanding in the age of data. Proceedings of the 58th Annual [Preprint]. ArXiv. DOI: https://doi.org/10.48550/arXiv.2503.23674 Meeting of the Association for Computational Linguistics, 5185–5198. [2] Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain DOI: https://doi.org/10.18653/v1/2020.acl-main.463 Sciences, 3(3), 417–457. [16] Turing, A. M. (1950). Computing machinery and intelligence. Mind, DOI: https://doi.org/10.1017/S0140525X00005756 59(236), 433–460. [3] Seth, A. K., & Bayne, T. (2022). Theories of consciousness. Nature DOI : https://doi.org/10.1093/mind/LIX.236.433 Reviews Neuroscience, 23(5), 389–405. [17] Chalmers, D. J. (2023, August 9). Could a large language model be DOI: https://doi.org/10.1038/s41583-022-00587-4 conscious? Boston Review. Retrieved from Boston Review URL [4] Tononi, G., Boly, M., Massimini, M., & Koch, C. (2016). Integrated [18] Block, N. (1995). On a confusion about a function of consciousness. information theory: From consciousness to its physical substrate. Behavioral and Brain Sciences, 18(2), 227–247. Nature Reviews Neuroscience, 17 DOI : https://doi.org/10.1017/S0140525X00038188 (7), 450–461. DOI : [19] Grinberg-Zylberbaum, J. (1981). The transformation of neuronal activity https://doi.org/10.1038/nrn.2016.44 [5] into conscious experience: The syntergic theory. Journal of Social and Baars, B. J. (1988). A Cognitive Theory of Consciousness . Cambridge University Press. Biological Structures, 4(3), 201–210. [6] Gams, M., & Kramar, S. (2024). Evaluating ChatGPT’s consciousness and DOI : https://doi.org/10.1016/S0140-1750(81)80036-X its capability to pass the Turing test: A comprehensive analysis. Journal [20] Tsamados, A., Aggarwal, N., Cowls, J., Morley, J., Roberts, H., Taddeo, of Computer and Communications, 12(3), 219–237. M., & Floridi, L. (2022). The ethics of algorithms: Key problems and DOI : https://doi.org/10.4236/jcc.2024.123014 solutions. AI & Society, 37(1), 215–230. [7] Floridi, L., & Cowls, J. (2022). A unified framework of five principles for DOI : https://doi.org/10.1007/s00146-021-01154-8 AI in society. In S. Carta (Ed.), Machine learning and the city: [21] LeCun, Y. (2022). A Path Towards Autonomous Machine Intelligence Applications in architecture and urban design (pp. 535–545). Wiley. (Version 0.9.2). OpenReview. https://openreview.net/forum?id=BZ5a1r- DOI : https://doi.org/10.1002/9781119815075.ch45. kVsf [8] [22] Bengio, Y. (2017). The Consciousness Prior (v2, Dec 2, 2019). Tononi, G. (2008). Consciousness as integrated information: A provisional manifesto. arXiv:1709.08568. DOI : https://doi.org/10.48550/arXiv.1709.08568 The Biological Bulletin, 215 (3), 216–242. DOI: https://doi.org/10.2307/25470707 [9] Oizumi, M., Albantakis, L., & Tononi, G. (2014). From the phenomenology to the mechanisms of consciousness: Integrated Information Theory 3.0. PLoS Computational Biology, 10(5), e1003588. DOI : https://doi.org/10.1371/journal.pcbi.1003588 [10] Dehaene, S., Kerszberg, M., & Changeux, J.-P. (1998). A neuronal model of a global workspace in effortful cognitive tasks. Proceedings of the National Academy of Sciences, 95(24), 14529–14534. DOI : https://doi.org/10.1073/pnas.95.24.14529 40 Building an Ontology of the Self: Sense of Agency and Bodily Self ∗ ∗ Luka Oprešnik Tia Križan Jaya Caporusso lo62831@student.uni- lj.si tk85796@student.uni- lj.si jaya.caporusso@ijs.si University of Ljubljana University of Ljubljana Jožef Stefan Institute Ljubljana, Slovenia Ljubljana, Slovenia Jožef Stefan International Postgraduate School Ljubljana, Slovenia Abstract goal, it is fundamental to identify and define the relevant Self- aspects. However, the studies on the Self—conducted in different We present provisional work aimed at developing a comprehen- disciplines and with various focuses—lack a unified terminology, sive ontology of the Self. The Self is understood as a complex and a comprehensive ontology of Self-aspects is missing. construct encompassing distinct yet interrelated aspects such as In this paper, we present the provisional work conducted to Sense of Agency (SoA), Bodily self (BS), and the Narrative Self. build an ontology of the Self. In particular, we have so far focused Drawing on existing literature, we define SoA and BS, further on two aspects: Sense of Agency (SoA) and Bodily Self (BS). In decompose them into elements, understood as the core compo- Section 2, we address existing literature on the Self. In Section 3, nents constituting each aspect (e.g. Moral Agency or Sense of we set our research objectives. Section 4 details the methodology Ownership). Elements are characterized by modes, defined as used to review relevant scholarship and build the ontology as specific ways in which elements manifest (e.g. active, responsive, well as the knowledge base. In Section 5, we describe results of passive). Where necessary, modes are grouped in sub-elements our work to date, while the full knowledge base is available in the for greater clarity. Each category of the ontology is situated in Appendix. Section 6 offers a discussion of the results, identifying relation to certain others and features a definition. To support key findings. Section 7 points out study limitations and outlines development of instruction for future labelling, a broader frame- next steps as well as possible future work. work–knowledge base–is constructed around the ontology. In it, a curated corpus of representative instances drawn from phe- nomenological interview transcripts and online forums is paired 2 Related work with commentary on relations, interactions, disambiguation, and Caporusso [8] conducted an empirical phenomenological study sources. The ontology and knowledge base are intended not only on dissolution experiences with a particular focus on the Self. to support the development of computational methods for the The codebook developed based on the analysis of phenomeno- identification of Self-related aspects in text, but also to serve as a logical interviews is a first step towards a framework with hi- common base for further research of the Self. erarchical organization of the experience of the Self, featuring Keywords category descriptions, examples, and comments. Building on pre- vious theoretical attempts to explain the experience of the Self, self, sense of agency, bodily self, ontology the author also identified several distinct Self categories, two of 1 which closely align with our understanding of Sense of Agency Introduction and Bodily Self. A study by Ataria et al. [1] similarly examined The Self, "the (perhaps sometimes elusive) feeling of being the the phenomenological nature of the sense of boundaries based particular person one is” [25], is a complex, multi-aspect entity on a single subject with 40 years of experience in practising [8]: it encompasses, for example, the experience of one’s body, mindfulness. From his descriptions they developed seven expe- thoughts, emotions, and sense of agency. The Self at large [25] riential categories, of which Location, Self, Agency (Control), and many of its aspects are widely addressed in cognitive sci- Ownership, and Center (First-Person Egocentric-Bodily Perspec- ence, psychology, and related disciplines (e.g., [2], [20], [8], [7]). tive) were of interest for us. Similarly, Nave et al. [20] exam- For example, the sense of agency is investigated in relation to ined reports from forty-six meditation practitioners who–under depression [19], while bodily experience in the context of deper- carefully controlled conditions–attempted purposeful dissolu- sonalisation and derealisation [27]. tion of self-boundaries. They identified common themes, which Our work is part of a larger project to develop a computational they grouped into six experiential categories. Five of them (Self- framework capable of automatically identifying the presence and Location, Attentional Disposition, SoA, First-Person Perspective, mode of Self-aspects in text [10]. The final models could be used and Bodily Sensations) relevant to our work. by professionals across disciplines to detect Self-aspects most Unlike these, most other studies we examined tend to focus relevant to their specific objectives, based on textual data such only on a few or a single dimension, without consideration for as clinical interviews or personal narratives. To achieve this the bigger picture. Especially prominent are studies of various ∗ body-related illusions. A mixed methods study by Petkova et al. Both authors contributed equally to this research. [23] combined body swap illusion with fMRI to explore the expe- Permission to make digital or hard copies of all or part of this work for personal rience of different modes of Body Ownership along with their or classroom use is granted without fee provided that copies are not made or neural correlates. A review of neuroimaging and body-related distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this illusions studies done by Serino et al. [24] explored Bodily Own- work must be honored. For all other uses, contact the owner /author(s). ership and Self-Location, and a review by Braun et al. [7] looked Information Society 2025, Ljubljana, Slovenia at studies of SoA and BS, discussing also their clinical and thera- © 2025 Copyright held by the owner/author(s). https://doi.org/https://doi.org/10.70314/is.2025.cogni.8 peutic relevance. A study by Huang et al. [18] utilized a series 41 Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia Oprešnik et al. of four behavioral experiments with head-mounted displays and systematically explore and define expressions of various Self- tactile stimulation to investigate the relationship between cate- aspects as they manifest in textual data. To manage the inherent gories: 1PP Location, Self-location, and Sense of Body-Location. complexity of the Self, we decide to focus on two distinct Self- In a study by Harduf et al. [15], a comparative experimental aspects: SoA and BS, as previously identified by Caporusso [8]. design using the moving rubber hand illusion was employed This facilitates familiarization with the relevant literature, en- to investigate the categories Body Ownership and SoA in psy- ables an in-depth analysis of the internal structures of individual chosis patients. A book of essays by a group of philosophers and aspects, and allows for the iterative development of a research psychologists [12] and a book of essays by Bermudez [6] focus methodology. singularly on the experience of the Body, discussing experiential Our approach has already been applied to SoA and BS, and we categories such as Spatial Perception, Sense of Bodily Ownership, plan to extend it to other Self-aspects to build a comprehensive Space of the Body, Body Awareness, Agency, and Self-Location. ontology of the Self. This approach consists of two main phases. Meanwhile, building on previous work, Bandura [5] articulates a First, an extensive, interdisciplinary literature review, drawing comprehensive conceptual model of human agency, elaborating from cognitive science, psychology, phenomenology, and related on the evolutionary foundations of agency, its developmental fields. Second, developing a hierarchical ontology along with a trajectory, and broader implications. He identifies four core com- knowledge base. We build our knowledge base drawing from ponents of agency: intentionality, forethought, self-regulation, different pre-existing studies and ontologies focusing on various and self-reflection. Moreover he distinguishes different modes aspects of the Self—each from a different perspective or discipline. of agency based on who the actor involved is: individual, proxy, The final ontology aims to be applied across diverse fields which or collective. Another key element of his framework is moral utilise different terminology [26]. Indeed, one of our goals is to agency, defined as the capacity to exercise control over one’s provide a standardized terminology to address Self-aspects across behavior, guided by a sense of right and wrong, as well as tak- the different fields and communities involved in Self-related re- ing responsibility for one’s actions. Similarly, the work of Hitlin search, facilitating data aggregation and interdisciplinarity [16]. and Elder [17] is grounded in conceptual synthesis, drawing on and reviewing existing literature on agency and the Self. Their 4.1 Literature review contribution emphasizes the temporal orientations of agency, We performed an initial survey of academic sources by searching highlighting how individuals project the Self across past, present, the DiKUL database for the terms Sense of Agency and Bodily and future contexts. Self. Examining the state of the literature helped shape further Self-aspects reflect in the language we produce [22]. Caporusso endeavors in the literature review, such as identifying predom- et al. [9] specifically looked at how Minimal, Narrative, Agentive, inant fields of research interest, additional search terms, and Bodily, and Social Self are expressed. This knowledge can then inclusion/exclusion criteria. Once completed, a systematic search be used to train models to identify Self-aspects in text [11]. operation was performed in the following databases: DiKUL, Despite these advances, existing approaches to the Self remain Google Scholar, Merlot, using the following search terms: agency, fragmented. While many fields have extensively categorized as- sense of agency, self as agent, aspects of the self, taxonomy of the pects of the Self, no existing ontology integrates these insights self, expression of agency, forethought, moral agency, self and body, into a unified, computationally operationalizable framework. Cur- bodily self, self-location, sense of identification, bodily sensations. rent models often incorporate phenomenological concepts but Papers were selected for in-depth review based on their abstract, lack precise definitions of the Self ’s components, overlook their field, journal, authors, and TOC, if available. Each selected paper interrelations, and omit explicit hierarchical structures. Conse- was scanned for further sources and search terms. Papers were quently, the Self is frequently presented as a fragmented set of chosen as building blocks for further work if they included phe- loosely connected descriptors. To address this gap, we propose nomenological accounts of SoA, BS, and any experiences that an integrative ontology that synthesizes insights from multiple fell within them—or if they were phenomenologically informed disciplines into a coherent, computationally operationalizable theoretical approaches to the Self. Answers to any questions framework for analyzing Self-related phenomena in text. that arose during the construction of the ontology (detailed in the subsequent section) were sought via further, more targeted 3 Reaserch objectives search operations. The aims of this research fit into the broader goal of developing a computational framework able to automatically detect Self- 4.2 Building the ontology aspects in text instances. To achieve this, a structured and com- The process of building the ontology and knowledge base in- putationally operationalizable ontology of the Self needs to be volves different steps. First, naming conventions are developed developed. In building such an ontology, the present study is to identify the different classes of our provisional version of the limited in scope to two aspects—SoA and BS—and is guided by ontology: we refer to BS and SoA as ; characteristics of aspects two research objectives (ROs): develop a provisional ontology for each aspect are (these may be further made up of elements sub- Sense of Agency and Bodily Self ( ), and develop a knowledge RO1 elements); and specific ways in which aspects and elements can base featuring text instances illustrating categories featured in be experienced and/or expressed are called . Following modes the ontology along with commentary on relations, interactions, Caporusso [8], we call those aspects which can attribute aspects disambiguation, and sources ( ). RO2 refer to other aspects, such as SoA (e.g., a person can experience agency over their body). 4 Methodology Second, a definition for each aspect is developed by searching Ontologies formally and explicitly specify the main concepts for and comparing various definitions in. These are synthesized relevant to the chosen domain and relations among them [14]. with lived experience in mind to create the most suitable and This study employs a descriptive and conceptual approach to accurate definition. 42 Building an Ontology of the Self: Sense of Agency and Bodily Self Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia Similarly, elements are identified based on the selected litera- are grouped into broader categories, called sub-elements. These ture, experiential data, and logical analysis. To ensure conceptual sub-elements combine binary states (e.g., presence or absence), clarity, a refinement process is applied. Refinements include de- variations in intensity (e.g., weaker or stronger), and continuums composing broad categories found in the source material into between two poles (e.g., only one part of the body or the whole more specific elements and, conversely, consolidating fragmented body), ensuring greater clarity. descriptions into single, coherent elements. Terminological in- Each aspect, element, sub-element, and mode includes a def- consistencies found in the used sources, such as instances where inition and a comment. The comments clarify relations and in- one name refers to multiple different elements or one element teractions with other categories, discuss similarities to related has multiple names, are resolved by selecting the most common concepts, provide disambiguation from potentially confusing cat- term, or the one we deemed most appropriate. As with aspects, egories, and list sources that informed its inclusion and definition. each element is given a formal definition. Where necessary, com- The modes are enriched with concrete examples sourced from ments are added to distinguish elements from related concepts, qualitative data, including phenomenological interviews and ex- note special circumstances, and describe relationships as well as periences described on forums such as Reddit. These examples interactions with other concepts. provide vivid, real-world descriptions of experiences in which For each element, a set of sub-elements and modes is iden- an element is expressed in a particular mode, grounding the tified to cover the full spectrum of its potential manifestations. structure in lived experience. These include general binary states (e.g., presence or absence), Specifically, SoA is made up of 10 interrelated elements, each variations in intensity (e.g., weaker or stronger), continua be- contributing uniquely to the identification and characterization of tween two experiential poles, and distinct categorical types of agency within text. These are: Presence of Agency, Forethought, experience. Intentionality, Self-reactiveness, Self-reflectiveness, Moral Agency, Self-efficacy, Agency in relation to who the actor is, Agency in 4.3 Building the Knowledge base relation to time, and Agency through the state of activation. BS consists of six elements: Bodily Sensations, Awareness, Sense of A knowledge base includes, other than the proposed ontology or Identification, Location, Sense of Ownership, and SoA. taxonomy, instances for each class. Most of the examples featured in our knowledge base (see Appendix) come from transcripts of some of the phenomenologi- 6 Discussion cal interviews conducted by Caporusso as part of her master’s The presented results for SoA and BS show important features thesis [8], which are, except for fragments in her thesis and in of the Self as laid plain in empirical phenomenological data and present work, currently not publicly available. The interviews other text instances, namely its inherent complexity and multi- explore how the Self is experienced in daily life and dissolution facetedness. From this stems our approach to building the ontol- experiences of seven anonymous co-researchers. LO and TK read ogy in an iterative fashion, mindful of the many interconnections through the selected interviews, identifying parts detailing differ- between its different classes, all while still treating Self-aspects ent possible manifestations of elements of their respective aspects. as autonomous conceptual units, to allow us a focused analysis of After this, modes that were still missing examples were identified their internal structures. Given the abstract nature of the Self as a and searched again using the document search function. construct, a central challenge of this research was how to render Examples other than those mentioned above are sourced from the subject within a structured framework. Although the initial Reddit and similar online forums, where users often describe three-level hierarchy proved useful, it occasionally oversimpli- their peculiar experiences in search of others with similar expe- fied complex phenomena and introduced redundancy, thereby riences, which made for a plentiful source. Initial search was per- revealing certain challenges in the construction process. Specifi- formed using Google search engine with a combination of terms cally, certain identified elements (e.g., Attention, Identification) Reddit/forum and sense of agency/bodily self. After the initial proved to be fundamental experiences that applied across sev- search, new terms—more specific to such websites—are identified eral aspects without being Self-aspects themselves. The modes and used directly to search the forums. Instances which clearly for these "trans-aspectual" elements were sometimes context- described experiences featured in the ontology are selected as specific, sometimes universal. It also became clear that certain examples and added to an extended version of the knowledge experiences sometimes appeared as aspects of the Self but could base in the Appendix. The extended version contains multiple also function as elements of another aspect, or were so strongly examples for each element and mode, thereby allowing for a interconnected as to seem inseparable. This was evident in the re- more robust grasp of the phenomena. lationship between SoA and the Sense of Ownership. For instance, a loss of the SoA was often accompanied by a loss of the Sense 5 Results of Ownership, but not invariably, making it incorrect to merge This study culminates in the development of a knowledge base them into a single experience. Such particularities and interac- (ontology with examples); this section outlines its structure. As tions were documented within the relevant definitions to create mentioned, each of the elements has sub-elements and/or modes. a more nuanced framework. Our findings underscore both the For a short version of the knowledge base, see the Appendix. interdependence of Self-aspects and the ontological complexity The knowledge base is organized hierarchically into four main of the Self. Importantly, this research also yields a methodology classes: aspects, elements, sub-elements, and modes. Aspects rep- that can guide future work on additional aspects, advancing ef- resent the broadest top-level domains of inquiry. Each aspect is forts toward a comprehensive ontology of the Self. We argue that broken down into its constituent elements, which are the fun- our approach provides a structured yet flexible framework for damental characteristics or components of that domain. Modes interpreting Self-related phenomena in natural language, while describe the specific ways in which aspects and elements are ex- remaining open to further development as research progresses perienced or expressed by individuals. Where necessary, modes and its applications expand. 43 Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia Oprešnik et al. 7 Limitations CroDeCo ( J6-60109) and Shapes of Shame in Slovene Literature ( J6-60113). JC is a recipient of the Young Researcher Grant PR- Several limitations should be acknowledged at the present stage 13409. of this research. The primary limitation lies in the necessary reduction of complex, interdependent phenomena into discrete, well-defined entities. While this reduction is essential for creating a structured and operationalizable framework for studying the Self and Self-related constructs, it inevitably risks oversimplifying References these phenomena and overlooking meaningful interconnections [1] Yochai Ataria, Yair Dor-Ziderman, and Aviva Berkovich-Ohana. 2015. How does it feel to lack a sense of boundaries? a case study of a long-term among them. Second, the research is currently in an early de- mindfulness meditator. , 37, 133–147. Consciousness and cognition velopmental phase, and the complete ontology, along with its [2] Albert Bandura. 1990. Perceived self-efficacy in the exercise of personal accompanying corpus of examples, is still being constructed. At agency. Journal of applied sport psychology, 2, 2, 128–163. [3] Albert Bandura. 2002. Selective moral disengagement in the exercise of this stage, only a subset of potential instances has been collected moral agency. , 31, 2, 101–119. Journal of moral education and analyzed, though this is not really a limitation, since we [4] Albert Bandura. 2006. Toward a psychology of human agency. Perspectives did not plan to have everything annotated yet, but we note it on psychological science , 1, 2, 164–180. [5] Albert Bandura. 2018. Toward a psychology of human agency: pathways here for transparency. Differences in interpretation among team Perspectives on psychological science and reflections. , 13, 2, 130–136. members highlight the ongoing need to refine annotation guide- [6] José Luis Bermúdez. 2018. . MIT Press. The bodily self: Selected essays [7] Niclas Braun, Stefan Debener, Nadine Spychala, Edith Bongartz, Peter Sörös, lines and strengthen collaborative coordination. These issues Helge HO Müller, and Alexandra Philipsen. 2018. The senses of agency and are not unexpected in exploratory work of this kind and will be ownership: a review. Frontiers in psychology, 9, 535. systematically addressed in later stages of the project through [8] Jaya Caporusso. 2022. Dissolution experiences and the experience of the self: an empirical phenomenological investigation. . Mei: CogSci Master’s Thesis expanded corpus development, refinement of definitions, and doi:10.25365/thesis.71694. the implementation of inter-annotator agreement procedures. Fi- [9] Jaya Caporusso, Boshko Koloski, Maša Rebernik, Senja Pollak, and Matthew Purver. 2024. A phenomenologically-inspired computational analysis of nally, another limitation is the current restriction of the analysis self-categories in text. In Proceedings of the 2024 International Conference on to English-language texts. This linguistic constraint limits the Statistical Analysis of Textual Data (JADT). Brussels, Belgium. generalizability of the taxonomy across languages and cultural [10] Jaya Caporusso, Matthew Purver, and Senja Pollak. 2025. A computational framework to identify self-aspects in text. In Proceedings of the 63rd Annual contexts. Addressing this limitation, future research will seek to Meeting of the Association for Computational Linguistics (Volume 4: Student expand the framework across multiple languages, including Slove- Research Workshop). Jin Zhao, Mingyang Wang, and Zhu Liu, editors. Asso- nian. In the future, this ontology will serve as a framework for ciation for Computational Linguistics, Vienna, Austria, (July 2025), 725–739. isbn: 979-8-89176-254-1. doi:10.18653/v1/2025.acl- srw.47. models including conventional discriminative approaches (such [11] Jaya Caporusso, Matthew Purver, and Senja Pollak. Submitted. Identifying as traditional machine learning models and neural networks), social self in text: a machine learning study. In Proceedings of Information generative large language models, embedding-based retrieval Society 2025 . SiKDD. [12] Frederique De Vignemont and Adrian JT Alsmith. 2017. The subject’s matter: models, and mixture-of-experts architectures [10] to detect Self- self-consciousness and the body. MIT Press. aspects in text. Aware that “there is no single ontology-design [13] Shaun Gallagher. 2012. Multiple aspects in the sense of agency. New ideas methodology” and that “the best solution almost always depends in psychology , 30, 1, 15–31. [14] Thomas R Gruber. 1993. A translation approach to portable ontology speci- on the application that you have in mind and the extensions that fications. Knowledge acquisition, 5, 2, 199–220. you anticipate” [21], we are guided by wanting to build an on- [15] Amir Harduf, Gabriella Panishev, Eiran V Harel, Yonatan Stern, and Roy Salomon. 2023. The bodily self from psychosis to psychedelics. Scientific tology on which annotation guidelines can be developed (which Reports , 13, 1, 21209. is the step that will follow the construction of the ontology; see [16] Janna Hastings, Werner Ceusters, Mark Jensen, Kevin Mulligan, and Barry Smith. 2012. Representing mental functioning: ontologies for mental health [10]). While currently we are providing a rather comprehensive and disease. description of the two Self-aspects analysed, ontology develop- [17] Steven Hitlin and Glen H Elder Jr. 2007. Time, self, and the curiously abstract ment is an iterative process [21], and the identified elements and concept of agency. , 25, 2, 170–191. Sociological theory [18] Hsu-Chia Huang, Yen-Tung Lee, Wen-Yeo Chen, and Caleb Liang. 2017. The modes will get skimmed in future work. This will be done based sense of 1pp-location contributes to shaping the perceived self-location on the following principles: 1) being relevant for our desired ap- Frontiers in Psychology together with the sense of body-location. , 8, 370. plications; 2) being detectable in text instances. Furthermore, the [19] Marishka M Mehta, Soojung Na, Xiaosi Gu, James W Murrough, and Laurel S Morris. 2023. Reward-related self-agency is disturbed in depression and initial versions of the ontology will be evaluated by discussing anxiety. , 18, 3, e0282727. PloS one with experts and by being employed in applications. [20] Ohad Nave, Fynn-Mathis Trautwein, Yochai Ataria, Yair Dor-Ziderman, Yoav Schweitzer, Stephen Fulder, and Aviva Berkovich-Ohana. 2021. Self- boundary dissolution in meditation: a phenomenological investigation. 8 Ethical Considerations and Authors’ Notes Brain sciences, 11, 6, 819. [21] Natalya F Noy, Deborah L McGuinness, et al. 2001. Ontology development All phenomenological interviews used as examples in this study 101: a guide to creating your first ontology. (2001). were conducted in the context of a master’s thesis [8] and adhered [22] James W Pennebaker, Matthias R Mehl, and Kate G Niederhoffer. 2003. Psychological aspects of natural language use: our words, our selves. Annual to established ethical guidelines. Of the participants originally review of psychology, 54, 1, 547–577. interviewed, seven provided consent for their transcripts to be [23] Valeria I Petkova, Malin Björnsdotter, Giovanni Gentile, Tomas Jonsson, Tie- used in subsequent research; only these interviews were included Qiang Li, and H Henrik Ehrsson. 2011. From part-to whole-body ownership in the multisensory brain. , 21, 13, 1118–1122. Current Biology in the present analysis. Identifying details of these and other text [24] Andrea Serino, Adrian Alsmith, Marcello Costantini, Alisa Mandrigin, Ana instances in the extended knowledge base were omitted to protect Tajadura-Jimenez, and Christophe Lopez. 2013. Bodily ownership and self- user anonymity. location: components of bodily self-consciousness. Consciousness and cogni- tion, 22, 4, 1239–1252. LO focused on BS, while TK on SoA. JC supervised the work. Self, no self?: Perspec- [25] Mark Siderits, Evan Thompson, and Dan Zahavi. 2013. tives from analytical, phenomenological, and Indian traditions. OUP Oxford. Acknowledgements [26] Holger Stenzhorn, Stefan Schulz, Martin Boeker, and Barry Smith. 2008. Journal of universal Adapting clinical ontologies in real-world environments. We acknowledge the financial support from the Slovenian Re- , 14, 22, 3767. computer science: J. UCS [27] Shogo Tanaka. 2018. What is it like to be disconnected from the body?: a phe- search Agency for research core funding for the programme nomenological account of disembodiment in depersonalization/derealization Knowledge Technologies (No. P2-0103) and from the projects disorder. , 25, 5-6, 239–262. Journal of Consciousness Studies 44 Building an Ontology of the Self: Sense of Agency and Bodily Self Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia A The sense of agency can feel your leg moving without reflecting on it or being con- Definition: scious of it. Because of this, it cannot easily be spotted in text Agency refers to the sense of having the capacity [13]. to act intentionally, make decisions, influence outcomes, reflect »Sense of ownership: the pre-reflective experience or sense and exert ownership over one’s actions. that I am the subject of the movement (e.g., a kinaesthetic expe- A.1 Presence of agency rience of movement) [13].« Definition: In reflective attribution of ownership, the action A.1.1 Present. is reflected upon and can be attributed to oneself. The movement Definition: The presence of agency refers to whether agency is consciously recognized as your own. This is much easier to and any of its elements can be identified in a text. spot in text [13]. Example: Examples: • » I can access them in the space where I am (dl_C [8]).« • »This is my body that is moving [13]).« A.1.2 • Absent. »I don’t know, they’re coming from me, they couldn’t be Definition: The absence of agency refers to a lack of inten- any other way, like, they’re just mine (dl_G [8]).« tionality, control, influence, and self-reflection over one’s actions This does not mean that the actions performed are actually and decisions. It implies that individuals are not actively shaping yours and/or your doing. their behavior but are instead being directed by external forces or internal impulses without conscious regulation. A.6 Moral agency In textual analysis, the absence of agency is reflected in the Definition: Moral agency refers to exercising control over one’s lack of any of its other elements. behavior, guided by a sense of right and wrong, and taking re- Example: sponsibility for those actions [3]. • »Well, the action was the exclamation (dl_C [8]).« Example: • »...You’re being such an ego!” Then there’s the rational- A.2 Intentionality ization, because “yeah but I understand shit now, so it’s Definition: Intentionality is forming an intention and planning justifiable, I can be a little bit of ego now” (DE_E [8]).« steps to achieve it, even if it does not necessarily result in action [4]. A.7 Self-efficiency Example: Definition: Self-efficacy refers to the agency we can exercise • »Let me describe to you with concrete thing (dl_B [8]).« based on our perception of ourselves and our belief in our ability to achieve desired outcomes [2]. A.3 Forethought Example: Definition: • »I can attend to anything... (dlB [8]).« Forethought refers to setting future plans and goals, and anticipating their outcomes through cognitive representation. It serves as motivation, guidance, and direction [4]. A.8 Agency in relation to who the actor is Example: Definition: Agency, in relation to who the actor is, refers to • who is exerting the action, whether it is done individually, in »...but for [boyfriend’s name] to come back, I knew he collaboration with others, or through an extension such as a tool would come back...(DE_E [8]).« or system [5]. A.4 Self-reactivness A.8.1 Individual agency. Definition: Self-reactivness refers to the execution of one’s in- Definition: Individual agency refers to describing ones own tentions and plans through deliberate action [4]. intentions, actions, decisions and control [5]. Example: Example: Examples of this have already been shown through- • out the document when talking about oneself. »...so it’s like, here I sit, on this chair, and at the other part of the wall it’s kind of near... (dl_F [8]).« A.8.2 Proxy agency. Definition: As we do not have control over all aspects of our A.5 Self-reflectivess lives, we exert agency by influencing and/or relying on others. Definition: Self-reflectiveness refers to the ability to evaluate We do this through proxy agency [5]. one’s own thoughts, actions, or ideas, and can be observed in A.8.3 Collective agency. conversation when a person reflects on these during or after an Definition: Collective agency refers to people working to- interaction [4]. gether, pooling knowledge, resources, and effort to achieve a Example: shared or partially shared goal [5]. • »I could navigate small things whitin the conversation but Example: I couldn’t leave the conversation (dlB [8]).« • »...we were co-influencing each other (dl_E [8]).« A.5.1 Self-attribution: Reflective sense of Ownership. Based on Gallager [13] we can distinguis self- attributions A.9 Agency in relation to time by diferentiating sense of agency and sense of ownership as Agency in relation to time refers to orientations Definition: pre-reflective and reflective. directed toward both the present and the future, while implic- Pre-reflective sense of ownership describes experiencing move- itly referencing the past and self-reflection, which contribute to ment or its sensation without being consciously aware of it. You identity-based agency [17]. 45 Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia Oprešnik et al. A.9.1 Existential agency. B.1 Bodily Sensations Hitlin and Elder [17] explain that this is a concept that refers Definition: Bodily Sensations [20] refers to the experience of to the capacity for self-directed action, even if it is automatic or sensations of the body including touch, temperature, interocep- unconscious. It is about freedom, being able to make decisions tion, and moving of muscles. Excluded are sight, hearing, smell and take action despite external forces and constraints. At this and taste, but sensations like burning in the eyes, eye muscle stage, anyone is able to make a decision about their actions, even strain, blocked nose, burnt tongue and similar do fall into this the powerless. category. Existential agency is always present and necessary for others Sub-elements: to exist. • Strength A.9.2 Pragmatic agency. • Location Definition: Pragmatic agency refers to decisions about one’s • Apprasial actions based on the present moment or temporal scope. It con- sists of decisions based on immediate needs rather than future- B.2 Awareness oriented goals [17]. Definition: The attribute aspect Awareness refers to the experi- A.9.3 Identity agency. ence of being—or not—more or less aware of a certain element or Definition: Identity agency refers to actions and decisions dynamics: whether, and how explicitly, strongly, and/or clearly, being shaped by one’s sense of identity. We act in accordance that element or dynamics is present in the experiential field [8]. with our roles, and in doing so, we make decisions and take actions that fulfill those roles [17]. B.2.1 Strength. Absent Example: • • »I wasn’t so aware of my body at that point. (...) It’s like, »That I’m a professor? Yes...That I have responsibility that as it moves back, my body... Like I’m not aware of a body I’m not doing, now (dl_F [8]).« anymore. If that makes sense. ( DE_G [8]).« A.9.4 Life-course agency. Low Definition: Life-course agency refers to the choices people make at different stages of their lives, often shaped by their • »For example, my left arm, I know that it was moving, but evolving circumstances, experiences, and future goals [17]. I don’t know what it was doing precisely. So, there are definitely parts of my body I’m not super aware of, at least A.10 Agency through the state of activation in terms of what they are doing exactly, like I could give Definition: you a rough... idea of the kind of movement, what kind of This element refers to the extent to which one has thing they were doing, if they were static, or if they were agency over his or her actions. To what extent are they in control moving, that kind of thing, but. (dl_H [8]).« [20]. High A.10.1 Active. Definition: • »I was kind of very aware of my posture my position in The active state of activation refers to an active space of the distance between us and so in in a weird process through which a subject exerts effort and exercises influ- way I was conscious of things that I I usually wouldn’t be ence to shape the outcome [20]. Example: right so I I was very conscious not only on my posture but • weirdly I was kind of conscious of my frame like how my » I can access them in the space where I am (dl_C [8]).« shoulders were and so how I’m turning towards him um I A.10.2 Responsive. was very conscious of how I stood like how my my feet Definition: Responsive state of actvation refers to a state were planted on the ground (dl_B [8]).« characterized by reduced activity, involving less effort and lim- ited capacity for manipulation, while still maintaining partial engagement [20]. Example: B.3 Sense of Identification • »I am aware of sound, of the student who is presenting his Definition: The attribute aspect Sense of identification refers to seminar, but I’m aware of this sound ass something that the experience of identifying—or not—with a certain element in disturbs me, a little bit (dl_F [8]).« the experiential field. A.10.3 Passive. Definition: A passive state of activation refers to a state in B.3.1 Strength. which the subject reports little or no sense of agency. This state Absent is characterized by a lack of manipulation or control, often de- scribed in Nave et al. reports as a release or surrender [20]. • »...since I didn’t identify with my body anymore...(DE_E Example: [8]).« • »I couldn’t access them, I couldn’t do anything about them Low (dl_C [8]).« High • »I identify with the body and with the mental representa- B The bodily self tion. However, the intensity of how much I feel one and Definition: The Self-aspect Bodily Self encompasses all experi- the other is different. I feel the body a lot more than the ences pertaining to one’s physical body [8]. mental representation. (DE_C [8]).« 46 Building an Ontology of the Self: Sense of Agency and Bodily Self Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia B.4 Location • »I was facing the mirror later, when the actual situation Definition: happened. So, I’m looking at the place, and everything Attribute category Location refers to the experience looks nice, and there’s the mirror, and [boyfriend’s name] of space, orientation and location of a certain element or dynamic. is on the other side of the room (DE_E [8]).« As an element of the Bodily Self it is the experience of the location of one’s body relative to itself (proprioception) as well relative to the world (orientation) [24]. B.5 Sense of Ownership Definition: Sense of ownership (SoO) refers to the subjective B.4.1 Unknown. experience of mineness toward one’s body [24], sensations, and • »Yeah it’s like the more it pulls back the more of the sense thoughts [7]. Certain experiences influence SoO, so it may be of my body... It’s like the more the sense of my body, like completely lost, heightened, or anywhere in between [8]. being here at a certain point in the world is gone. (DE_G [8]).« B.5.1 Absent. B.4.2 B.5.2 Part of the body. Vague. • • »All the parts that are felt in the lower part [of my body], I »Well it’s it’s it’s part of the space that I occupy there is had ownership over, yeah. Or I felt that it was mine (DE_C space that is me and there is space that isn’t (dl_B [8]).« [8]).« B.4.3 Exact. B.5.3 Whole body. 47 Modeling Nonlinear Change in Psychotherapy: Toward an AI Decision-Support System with Synthetic Client Data † Oskar Šonc Rok Smodiš Tine Kolenik os05793@student.uni- lj.si rs68734@student.uni- lj.si tine.kolenik@ccsys.de Pedagoška fakulteta, Kognitivna Pedagoška fakulteta, Kognitivna Institute of Synergetics and znanost znanost Psychotherapy Research, Paracelsus Ljubljana, Slovenia Ljubljana, Slovenia Medical University Salzburg, Austria Günter Schiepek Wolfgang Aichhorn guenter.schiepek@ccsys.de w.aichhorn@salk.at Institute of Synergetics and University Hospital of Psychiatry, Psychotherapy Research, Paracelsus Psychotherapy, and Psychosomatics Medical University Salzburg, Austria Salzburg, Austria Abstract detecting transitions [3, 4, 5]. Such data support both retrospec- tive analyses and anticipatory detection, enabling real-time feed- Psychotherapists typically choose interventions based on limited, back for clinical decisions [4, 5]. Computational decision-support session-bound information. We present a partial viability study systems (DSS) have been proposed to operationalize this poten- of an AI decision support system for psychotherapy, which ad- tial, integrating multimodal data to forecast therapeutic shifts dresses this issue. The system forecasts next-day changes in five and recommend personalized interventions [6]. Forecasts in this synergetic process variables: problem severity (P), therapeutic context mean short-horizon predictions of the five nonlinear success (S), motivation (M), emotions (E), and insight (I), and synergetic state variables, which are problem severity (P), thera- combines these forecasts with phase transition detection to sup- peutic success (S), motivation to change (M), emotions (E), and port anticipatory guidance. We created synthetic client personas insight (I). They evolve daily and influence one another through and simulated daily trajectories for eighty to one hundred days nonlinear functions [1, 7]. By combining machine learning with with weekly sessions. Each day included a diary entry that aligns interpretable, synergetic modeling, these systems aim to improve with the simulated state. We extracted features from diaries and the timing and precision of interventions [4, 6]. The present session evaluations, including sentiment, readability, syntactic study contributes a partial viability test of such a DSS, focusing complexity, lexical richness, agreement, and discrepancy between on synthetic client data and evaluating a forecasting pipeline client and therapist ratings. We evaluated Random Forest as the across these five dimensions. Our goal is not clinical validation, main model, along with Gradient Boosting and Ridge baselines, but methodological feasibility and guidance for future evaluation using splits by client. We also added a Pattern Transition Detec- with real clinical data [1, 4]. tion Algorithm (PTDA), which identifies critical fluctuations and This study is part of a broader project carried out by the Insti- potential transitions. Across dimensions, our preliminary results tute of Synergetics and Psychotherapy Research at the Paracel- indicate that diary sentiment is the strongest predictor of next- sus Medical University Salzburg. The project aims to develop an day change. The pipeline demonstrates feasibility and provides application that supports psychotherapists by suggesting and ex- a path to interpretable, real-time recommendations. Next steps plaining personalized interventions across the five state variables include clinical validation on real data. (P, S, M, E, I). Keywords decision support, psychotherapy, phase transitions, diary text, 2 Related work synthetic data 2.1 Nonlinear change and intensive time-series in psychotherapy 1 Introduction The synergetic model represents change through five interact- Psychotherapeutic change is nonlinear, often marked by discon- ing state variables: problem severity (P), therapeutic success (S), tinuous shifts rather than steady improvement [1, 2]. Capturing motivation to change (M), emotions (E), and insight (I). Their these dynamics requires intensive monitoring, as daily diaries, nonlinear coupling produces instabilities and discontinuous tran- high-frequency questionnaires like the Therapy Process Ques- sitions. Simulations show positive largest Lyapunov exponents, tionnaire, and brief session ratings yield time series suitable for which imply restricted predictability, and daily self-report with the Therapy Process Questionnaire has been validated for inten- Permission to make digital or hard copies of all or part of this work for personal sive monitoring [7, 2, 4]. or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner /author(s). 2.2 Phase-transition detection and forecasting Information Society 2025, Ljubljana, Slovenia Transition-sensitive methods (e.g., PTDA-inspired indicators) de- © 2025 Copyright held by the owner/author(s). https://doi.org/10.70314/is.2025.cogni.9 tect impending shifts in trajectories; given chaotic dynamics, 48 Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia Oskar Šonc, Rok Smodiš, Tine Kolenik, Günter Schiepek, and Wolfgang Aichhorn long-range prediction is infeasible and only short-horizon fore- For each persona, we simulated daily trajectories of the five casts are appropriate for applied decision support [5]. synergetic state variables defined in the nonlinear change model (Schiepek et al., 2016; Schiepek et al., 2017). The dynamics evolved 2.3 Computational psychotherapy and with small fixed linear couplings and mild damping plus additive decision support Gaussian noise, with values clipped to the range [-3, 6]. We sim- ulated 80–100 days per client and designated every seventh day Computational DSS integrate multimodal process data (e.g., ques- as a therapy session, which included structured pre- and post- tionnaires, diaries) with interpretable modeling and therapist- session evaluations completed by both client and therapist. This aligned explanations to generate actionable recommendations; procedure produced time series that exhibit variability, occasional our approach anchors these recommendations explicitly in the instabilities, and realistic recovery trajectories; diary generation synergetic five-variable model [6]. was conditioned on the simulated states to align text with day- 2.4 level changes. Random seeds were fixed for reproducibility. Synthetic data for psychotherapy NLP All outputs were stored in structured JSON files that contained pipelines raw trajectories, diary texts, session-day flags, and evaluation Because psychotherapy data are sensitive, synthetic corpora have ratings. These were subsequently enriched with feature repre- been explored; zero-shot generations are often shallow, while sentations to support model training and interpretability. few-shot/taxonomy-guided prompting and human-in-the-loop filtering improve fidelity [8, 9]. 4.2 Feature Extraction 2.5 Empathy and therapeutic language Features were derived from both diary entries and session evalua- modeling tions, capturing textual signals and structured ratings that inform Large language model generated therapy dialogues can train em- downstream forecasting. Diary texts were processed using stan- pathy detectors: augmenting a Reddit dataset with 420 synthetic dard natural language processing pipelines, extracting sentiment dialogues improved F1 by up to 0.10 (exploration 0.48 0.53, scores (VADER, TextBlob), readability indices, syntactic complex-→ interpretation 0.32 0.48, emotional reaction 0.58 0.59), and ity measures, lexical richness and word counts. → → replacing 50% of the data raised interpretation accuracy from Session-day evaluations were transformed into quantitative 0.50 to 0.57 while other metrics remained comparable; the study descriptors by computing mean, variance, and maximum differ- generated 10,464 synthetic dialogues and evaluated on 579 real ences across therapist pre-, therapist post-, and client post-session dialogue pairs [10]. ratings for each of the five synergetic state variables (P, S, M, E, I). Additionally, similarity metrics (cosine similarity, Euclidean 3 Research Objectives distance) were calculated to assess alignment and discrepancy between client and therapist perspectives. This paper has two aims: (1) Demonstrate a working pipeline—data Partial viability study. schema, feature extraction, forecasting, recommendation 4.3 Model training and explanation—that operates on synthetic clients mirror- Forecasting models were developed to predict next-day scores ing our planned real-world data collection (five synergetic for each dimension. We trained Random Forest regressors as the state variables + diary + pre/post session ratings). (2) Outline how phase- Bridging detection and forecasting. primary model due to their robustness and ability to provide interpretable feature importances. In addition, we fit Gradient transition detection (e.g., PTDA and related convergence- Boosting regressors as a complementary tree-ensemble with a validated methods) can be integrated with short-horizon different bias–variance profile, tuning the number of estimators forecasting of the five dimensions to generate anticipatory and learning rate on validation folds. Finally, we included a reg- intervention suggestions (e.g., focus more on S and I when ularized linear comparator via Ridge Regression; features were a transition is imminent). standardized before fitting, providing a strong high-bias baseline We position this as a pre-study on synthetic data, designed to and an interpretable contrast to the tree models. All models were de-risk methodological choices and inform the design of a pilot trained and evaluated under the same grouped-by-client splits with genuine clinical time series. and metrics to enable direct comparison. 4 Methodology 4.1 Synthetic Dataset Generation 4.4 Phase-Transition Detection We first generated a set of client personas with demographic To assess critical fluctuations in therapeutic trajectories, we imple- and diagnostic diversity (e.g., gender, age, primary complaint). mented a phase-transition detection layer. Peaks in dynamic com- Personas were created using LLMs guided by structured prompts. plexity were flagged as candidate transitions. A PTDA-inspired For each persona, we initialized the five synergetic state vari- algorithm was then applied to combine these signals with addi- ables—problem severity (P), therapeutic success (S), motivation to tional markers, yielding annotated trajectories with transition change (M), emotions (E), and insight/new perspectives (I)—from indicators. the persona profile, defaulting to near-neutral when unspecified. Together, these components operationalize an end-to-end pipeline Daily diary entries complemented the numerical scores and were that simulates client trajectories, extracts features, forecasts next- produced with a fixed prompt using GPT-4o-mini (temperature day changes, and detects phase transitions, producing inter- 0.7), conditioning on the current day’s state and a brief progress pretable outputs suitable for therapist review. The pipeline can note to ensure narrative coherence across days. be seen in Figure 1. 49 Modeling Nonlinear Change in Psychotherapy Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia Table 1: Therapy data predictions Model 2 MSE R Random Forest 0.2208 -0.138 Gradient Boosting 0.2176 -0.151 Ridge Regression 0.1878 0.013 6 Discussion Figure 1: Project pipeline. 6.1 What would likely change with real data. Our synthetic-only runs mainly validated the pipeline, but they also favored a carry-forward baseline due to piecewise plateaus and low noise. On real synergetic state variables + diary + session 5 Results series, we would expect less baseline advantage because of non- stationarity and therapist actions, modest but consistent MAE Permutation importance analyses revealed that sentiment was gains for short-horizon forecasts on P/S/E (with Motivation and consistently the strongest predictor across all dimensions. Figure Insight remaining more variable), and transition warnings with 2 illustrates this pattern for the Emotions dimension, where sen- clinically useful lead times once thresholds are tuned to real timent clearly dominated over other text-based features such as fluctuations rather than generator quirks [7, 4, 5, 6]. readability, syntactic complexity, or lexical diversity. This finding indicates that the emotional valence expressed in daily diary en- 6.2 Evidence from similar synthetic-vs-real tries was the most informative signal for forecasting short-term changes in therapy process variables. comparisons. Prior work has shown that synthetic corpora can help, but only when evaluated on real test sets. Cabrera Lozoya, Hernandez Lua, Barajas Perches, Conway, and D’Alfonso [10] generated 10,464 synthetic dialogues and found that augmenting Reddit data improved empathy F1 by up to 0.10, while replacing up to 50% of the organic data preserved or improved performance (e.g., interpretation accuracy 0.57 vs. 0.50) on a 579-pair clinical test set (MOST+ and Alexander Street). This pattern supports our claim that real synergetic state variables + diary data are necessary to calibrate feature weights (e.g., diary sentiment) and transition thresholds reliably. 6.3 Limitations and next steps Figure 2: Predictive power of different diary entry charac- Our evaluation used only synthetic labels, a single-client show- teristics (example: Emotions dimension). case, and no ground-truth transitions, so we could not report precision/recall or lead time. Moreover, synthetic text may en- code generator biases, inflating the apparent weight of some features [8, 9]. Next steps are therefore clear: collect real data, On a cohort of synthetic clients, the pipeline trains stably define transitions and compare to stronger baselines [5, 6]. and yields face-valid recommendations, exhibiting behavior that matches qualitative expectations across the five dimensions.Random 7 Conclusion and Future Work Forest models produced smooth short-horizon predictions for We presented a DSS that organizes session planning around five Therapeutic success(S) and Emotion(E), with appropriately higher nonlinear change dimensions and provides explainable, forecast-driven variance on Motivation and Insight when diaries include incon- intervention focus suggestions. This partial viability study shows sistent motivational/insight signals. the full pipeline operating on synthetic clients and specifies the We trained Random Forest, Gradient Boosting, and Ridge Re- next steps: (i) collect pilot synergetic state variables + diary + gression models to forecast next-day values of the five synergetic session micro-data; (ii) integrate a phase-transition layer; (iii) state variables on synthetic client data generated from our non- quantitatively evaluate forecasting and recommendation useful- linear change model. Table 1 summarizes the performance in ness on real data; (iv) add a block-diagram figure and finalize a terms of mean squared error (MSE) and coefficient of determina- 2 web prototype for therapist feedback. Ultimately, our goal is a tion (𝑅 ), reported under the same evaluation protocol for direct hybrid system where nonlinear modelling and interpretable ML comparability. Ridge obtained the lowest MSE (0.188) and the only positive 𝑅 2 jointly inform what to focus on next in a given session. (0.01), indicating a small but consistent gain over trivial predictors. Random Forest and Gradient Boosting achieved comparable errors (MSE 0.22) with negative 𝑅 , in- 8 Ethical note ≈ 2 dicating worse performance than a mean (null) baseline on this This DSS is meant to be assistive only and does not automate clin- 2 short-horizon task. For reference, the mean baseline (the 𝑅 an- ical decisions or crisis response, emphasizing clinician oversight chor) has MSE 0.190, implying that Ridge reduces error by at all times. All results are based on synthetic data and make ≈ about 1% in aggregate. no claims of clinical efficacy. Recommendations are meant to be 50 Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia Oskar Šonc, Rok Smodiš, Tine Kolenik, Günter Schiepek, and Wolfgang Aichhorn shown to clinicians for judgment only. Data are minimised and 10.3389/fpsyg.2020.01970 [Online]. Available: http://dx.doi.org/10.3389/ fpsyg.2020.01970 access-controlled; if ever in use, model inputs/outputs will be [6] T. Kolenik, G. Schiepek, and M. Gams, “Computational psychotherapy sys- logged for audit. tem for mental health prediction and behavior change with a conversational agent,” , vol. Volume 20, pp. 2465– Neuropsychiatric Disease and Treatment References 2498, Dec. 2024, issn: 1178-2021. doi: 10.2147/ndt.s417695 [Online]. Avail- able: http://dx.doi.org/10.2147/NDT.S417695 [1] G. Schiepek, B. Aas, and K. Viol, “The mathematics of psychotherapy: A [7] G. K. Schiepek et al., “Psychotherapy is chaotic—(not only) in a computa- nonlinear model of change dynamics,” Nonlinear Dynamics, Psychology, and Frontiers in Psychology tional world,” , vol. 8, Apr. 2017, issn: 1664-1078. doi: Life Sciences, vol. 20, no. 3, pp. 369–399, 2016. 10.3389/fpsyg.2017.00379 [Online]. Available: http://dx.doi.org/10.3389/ [2] H. Schöller, K. Viol, W. Aichhorn, M.-T. Hütt, and G. Schiepek, “Personality fpsyg.2017.00379 development in psychotherapy: A synergetic model of state-trait dynamics,” [8] Z. Li, H. Zhu, Z. Lu, and M. Yin, “Synthetic data generation with large Cognitive Neurodynamics, vol. 12, no. 5, pp. 441–459, Jun. 2018, issn: 1871- language models for text classification: Potential and limitations,” in Pro- 4099. doi: 10.1007/s11571- 018- 9488- y [Online]. Available: http://dx.doi.org/ ceedings of the 2023 Conference on Empirical Methods in Natural Language 10.1007/s11571- 018- 9488- y Processing , Association for Computational Linguistics, 2023. doi: 10.18653/ [3] A. M. Hayes, J.-P. Laurenceau, G. Feldman, J. L. Strauss, and L. Cardaciotto, v1/2023.emnlp- main.647 [Online]. Available: http://dx.doi.org/10.18653/v1/ “Change is not always linear: The study of nonlinear and discontinuous 2023.emnlp- main.647 patterns of change in psychotherapy,” , vol. 27, Clinical Psychology Review [9] V. Veselovsky, M. H. Ribeiro, A. Arora, M. Josifoski, A. Anderson, and R. no. 6, pp. 715–723, Jul. 2007, issn: 0272-7358. doi: 10.1016/j.cpr.2007.01.008 Generating faithful synthetic data with large language models: A case West, [Online]. Available: http://dx.doi.org/10.1016/j.cpr.2007.01.008 study in computational social science , 2023. doi: 10.48550/ARXIV.2305.15041 [4] G. Schiepek et al., “The therapy process questionnaire - factor analysis and [Online]. Available: https://arxiv.org/abs/2305.15041 psychometric properties of a multidimensional self-rating scale for high- [10] D. Cabrera Lozoya, E. Hernandez Lua, J. A. Barajas Perches, M. Conway, frequency monitoring of psychotherapeutic processes,” Clinical Psychology and S. D’Alfonso, “Synthetic empathy: Generating and evaluating artifi- amp; Psychotherapy, vol. 26, no. 5, pp. 586–602, Jul. 2019, issn: 1099-0879. doi: cial psychotherapy dialogues to detect empathy in counseling sessions,” in 10.1002/cpp.2384 [Online]. Available: http://dx.doi.org/10.1002/cpp.2384 Proceedings of the 10th Workshop on Computational Linguistics and Clini- [5] G. Schiepek et al., “Convergent validation of methods for the identification cal Psychology (CLPsych 2025), Association for Computational Linguistics, of psychotherapeutic phase transitions in time series of empirical and model 2025, pp. 157–171. doi: 10.18653/v1/2025.clpsych- 1.13 [Online]. Available: systems,” , vol. 11, Aug. 2020, issn: 1664-1078. doi: Frontiers in Psychology http://dx.doi.org/10.18653/v1/2025.clpsych- 1.13 51 What Words Reveal About Mental Health: A Computational Language Analysis Around Phase Transitions in Psychotherapy Mateja Šutar Tine Kolenik Günter Schiepek Wolfgang Aichhorn University of Ljubljana Institute of Synergetics and Institute of Synergetics and University Hospital of Psychiatry, Ljubljana, Slovenia Psychotherapy Research Psychotherapy Research Psychotherapy, and Psychosomatics mateja.sutar@gmail.com Paracelsus Medical University Paracelsus Medical University Salzburg, Austria Salzburg, Austria Salzburg, Austria w.aichhorn@salk.at tine.kolenik@ccsys.de guenter.schiepek@ccsys.de Abstract signal shifts in a client’s psychological state. Such transitions may involve sudden alterations in affective tone, the emergence Language can reflect key psychological changes during of new insights, or changes in symptom severity [5]. While psychotherapy, known as phase transitions (PTs). These sudden quantitative time-series approaches, such as the analysis of shifts in mood, insight, or symptom severity are often expressed questionnaires, have shed light on the temporal dynamics of PTs, in clients' written narratives. We investigated how linguistic far less is known about how these key points are manifested in features in client diaries relate to PTs by combining textual data patients’ own narratives. Diary writing, in particular, provides a within-participant comparisons and aggregated group-level systematic study of its content during psychotherapy remains analysis. Results revealed systematic shifts in word count, limited. with clinical assessments. Feature changes were analyzed using rich, ecologically valid record of subjective experience, yet the surrounding PTs. These findings may offer additional insight Our work addresses this gap by applying computational pronoun use, and psychological processes-related terms into therapeutic progress and support the development of novel linguistic methods to patient diaries collected during inpatient interventions. psychiatric treatment. Specifically, we examine whether linguistic features change systematically around clinically Keywords identified PTs. By integrating text analysis with validated language use, linguistic shifts, LIWC, phase transitions, psychometric methods, we aim to explore the content of psychotherapy, mental health psychological transitions. 1 2 Methods Introduction Language is first and foremost a tool for communication, 2.1 Participants and Dataset psychological states, which researchers have long analyzed to psychotherapy; however, one case was excluded due to missing data around phase transitions, resulting in a final sample of 27 gain insight into thought and behavior. Beyond its role in anonymized clients. The duration of data collection for each communication, linguistic behavior reflects underlying turn, everyday language carries subtle cues about our Our research initially included 28 clients undergoing inpatient enabling humans to share ideas, emotions, and knowledge [1]. In client ranged from 74 to 154 consecutive days of hospitalization, mechanisms of attention, affect regulation, and self-concept, with an average length of 88.3 days. The dataset consisted of making it an increasingly valuable marker in psychology [2]. daily client diary entries alongside Therapy Process range of mental distress [3], and even psychiatric diagnoses [4]. determined PTs. In total, 102 PTs were identified, corresponding to a mean of 3.5 PTs per client. The number of PTs per Thus, language is not only a medium for therapeutic exchange participant ranged from 0 to 5, with all but one participant but also a temporal reflection of a person's mental change. that distinctive linguistic patterns can serve as proxies for a wide Questionnaire (TPQ) results annotated with clinically Recent advances in computational linguistics have demonstrated as a complex dynamic system in which sudden, discontinuous A growing body of research conceptualizes psychotherapy exhibiting at least one PT. All diary entries were written in German language. Participants entered their diary data digitally via PCs, tablets, or smartphones, with no mention of specific changes—commonly referred to as phase transitions (PTs)— instructions regarding length, content, or frequency beyond daily reporting. TPQ represents a validated self-report measure Permission to make digital or hard copies of part or all of this work for personal or designed to capture fluctuations in therapeutic progress and classroom use is granted without fee provided that copies are not made or distributed symptomatology. Clinical experts independently identified PTs for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must by detecting discontinuities in the TPQ time series. These PTs be honored. For all other uses, contact the owner/author(s). served as reference points around which we examined changes Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia © 2025 Copyright held by the owner/author(s). in language use, allowing us to investigate how linguistic http://doi.org/10.70314/is.2025.cogni.7 patterns correspond to shifts in clients’ psychological states. 52 2.2 Text preprocessing and Feature Extraction Diary entries were analyzed using the Linguistic Inquiry and Word Count (LIWC) application [6], which classified words into psychologically relevant categories (e.g., emotion, cognitive processes, time orientation). This procedure yielded 117 extracted features per diary entry, representing both linguistic dimensions (e.g., pronoun use, total function words) and psychological processes (e.g., emotion, cognition, drives). To account for interindividual variability in diary length, all features were normalized as relative frequencies. 2.3 Statistical analysis To examine linguistic change in the context of PTs, we defined temporal windows of 3, 5, and 7 calendar days before and after each clinically identified transition. At present, there is little empirical guidance on how to determine the appropriate time frame for detecting language shifts during psychotherapy. Prior research on linguistic responses to traumatic events, however, suggests that linguistic changes are often immediate but short- lived. For instance, following the 9/11 attacks, the diaries of an Figure 1: Visualization of linguistic shifts around a client’s on-line journaling service revealed sharp increases in negative phase transition (PT). This figure shows shifts in linguistic emotion, cognitive engagement, and social referencing that features (Sentiment Negative, Impersonal Pronouns, largely returned to baseline within about a week [7]. Drawing on Motion) tracked over 90 days of psychotherapy. Red dashed short-term and extended dynamics surrounding PTs, as Shaded regions represent temporal analysis windows of 3 (violet), 5 (green), and 7 days (orange) before and after each this evidence, we adopted multiple window sizes to capture both lines mark a PT identified through clinical assessment. visualized in Figure 1. PT. The plots illustrate how different linguistic features Two levels of analysis were performed: may exhibit distinct patterns of change around the same Within-participant analysis : For each participant, we compared turning point. To illustrate, diary entries corresponding to pre- and post-transition feature distributions using the Wilcoxon this specific PT shifted from “Today was a very exhausting Signed-Rank Test, a nonparametric test suitable for paired, non- day… I notice that I have trouble concentrating…” (PT−1) normally distributed data [8]. Given the exploratory nature of this to “I tried slacklining for the first time… It makes me focus analysis, we adopted a liberal threshold (p < 0.15). Each PT was completely and the little successes feel amazing.” (PT+5), treated separately rather than averaging across a participant’s exemplifying the qualitative change in language multiple PTs, allowing us to capture transition-specific accompanying the transition. dynamics. Aggregated group-level analysis Aggregated analysis showed some shared patterns for 5- : To identify consistent patterns across participants, pre- and post-transition feature values were day and 7-day windows. An overview of the results is presented aggregated across participants and tested using the Wilcoxon in Table 1. It includes decreases in achievement- (Δ median -1.64 Rank-Sum Test ( pp, |r|=0.22, q=0.0863 for 5-day window; Δ median -2.30 pp, p < 0.05). This approach allowed us to examine group-level patterns, leveraging the summaries from each PT. |r|=0.37, q=0.000038 for 7-day window), work- (Δ median -1.32 By combining individual- and group-level analyses, we pp, |r|=0.39, q=0.0065 for 5-day window; Δ median -1.44 pp, aimed to capture both within-person change processes and shared |r|=0.27, q=0.0077 for 7-day window), feeling-, female-, and linguistic dynamics indicative of psychotherapeutic turning power- terms, as well as increases in adverbs (Δ median 1.44 pp, points. |r|=0.23, q=0.0725 for 5-day window; Δ median 1.50 pp, |r|=0.22, q=0.019 for 7-day window), past focus (Δ median 2.35 pp, |r|=0.35, q=0.0025 for 5-day window; Δ median 3.98 pp, |r|=0.22, 3 st Results q=0.028 for 7-day window), home-terms, and 1 person plural expressions. Unique to the 5-day window were decreases in We found no observable changes in linguistic features within the affect (Δ median -4.19 pp, |r|=0.42, q=0.0021), impersonal 3-day window in the within-participant analysis. Conversely, pronouns (Δ median -1.96 pp, |r|=0.40, q=0.0021), negative several linguistic features showed consistent changes across both emotion (Δ median -1.18 pp, |r|=0.24, q=0.036), articles, comma the 5-day and 7-day windows. At 5 days, the most frequent use, and reward-terms, while the 7-day window alone showed individual shifts involved average sentence length (19 PTs, 15 decreases in drives (Δ median -2.94 pp, |r|=0.20, q=0.023), and drops, 4 gains), the total number of pronouns (15 PTs, 8 drops, 7 discrepancy-terms (Δ median -0.63 pp, |r|=0.14, q=0.12). (14 PTs, 9 drops, 5 gains), while the 7-day window showed most Increases in differentiation- (Δ median 1.59 pp, |r|=0.22, gains), negative emotion (14 PTs, 10 drops, 4 gains), and drives q=0.073), family- (Δ median 0.40 pp, |r|=0.31, q=0.074), and frequent changes in all punctuation (18 PTs, 6 drops, 12 gains), money-related terms were specific to the 5-day window, while average sentence length (17 PTs, 13 drops, 4 gains), word count increases in positive emotion (Δ median 5.31 pp, |r|=0.40, (17 PTs, 11 drops, 6 gains), and certainty (17 PTs, 8 drops, 9 q=0.0049), negative emotion (Δ median 1.11 pp, |r|=0.18, gains). 53 q=0.036), anger (Δ median 0.29 pp, |r|=0.23, q=0.023), personal Table 1: Aggregated analysis results pronouns (Δ median 3.58 pp, |r|=0.34, q=0.0076), prepositions, Category Most frequently Direction Time- conjunctions, negations, netspeak, and time-terms were unique used examples (Gain ↑ / Drop window to the 7-day window. ↓) Work work, school, working, class 4 Discussion Achievement work, better, Our results indicate that measurable language changes occur best, working around phase transitions in clients undergoing psychotherapy. felt Feeling feel, hard, cool, ↓ These changes, particularly in content categories, can provide Power own, order, transitions. Because data were aggregated across diverse 5 & 7 Female she, her, girl, days insight into the psychological processes associated with such allow, power participants, the observed patterns were heterogeneous: some woman participants showed improvement, while others experienced Adverbs so, just, about, deterioration. This variability likely accounts for the there simultaneous increases in both positive and negative emotion Home home, house, ↑ room, bed features in the aggregated data. Thus, apparent contradictions in Past focus was, had, were, directionality may reflect mixed individual trajectories, as the been analysis was not grouped by phase transition type. 1 st person plural we, our, us, lets In our results, several function word categories—such as Negative hate, bad, hurt, articles, prepositions, personal pronouns, impersonal pronouns, emotion tired conjunctions, adverbs, and negations—were also observed. Impersonal that, it, this, what These terms, along with auxiliary verbs, are used in the pronouns Analytical Thinking feature, also known as the Categorical- ↓ Affect emotion, mood Dynamic Index (CDI) [9], which is a metric of logical thinking. Articles a, an, the, alot Studies revealed that the CDI reflects students’ thinking style and Reward opportun*, win, is linked to differences in academic performance [10]. gain*, benefit* 5 days Comma 4.1 Differentiation but, not, if, or Language Characteristics of Distinct Mental Health Disorders Family parent*, mother*, father*, ↑ Previous studies have documented that different mental health baby disorders are associated with distinct patterns of language use. Money business*, pay*, For example, ADHD is linked to more third-person plural price*, market* pronouns and shorter clauses [11, 12], while bipolar disorder Discrep would, can, ↓ want, could shows greater self-focus and references to death [13]. Borderline Drives we, our, work, us personality disorder (BPD) involves more swear words, death-Negative hate, bad, hurt, related words, and third-person singular pronouns [3]. emotion tired Individuals with social anxiety disorder (SAD) used self-Positive good, love, referential, anxiety, and sensory words, and made fewer emotion happy, hope references to other people [14]; Major depressive disorder Anger hate, mad, angry, (MDD) involves first-person pronouns, past tense, and repetitive, frustr* short sentences [15]. Schizophrenia relates to low semantic 7 days Time when, now, then, day ↑ cohesion, anger- and religion-related words, references to auditory hallucinations, while also characterized by decreased Personal I, you, my, me usage of words related to work, friends, and health [3, 16]. pronouns Negations not, no, never, 4.2 nothing LIWC Analysis Prepositions to, of, in, for LIWC is a popular top-down method that offers several Conjunctions and, but, so, as advantages for the study of language and cognition. It is a Netspeak :), u, lol, haha* standardized, replicable, and efficient method for quantifying large volumes of textual data to extract psychologically relevant 4.2.1 Top-down vs. Bottom-up Methods. Top-down and psychometrically valid measures from language [2, 3]. Top- methods, while highly structured, may sometimes overlook down methods are based on “dictionaries,” categories of words context-specific, cultural, or metaphorical nuances [2]. Bottom- or phrases, each associated with a given construct or set of up approaches, by contrast, focus on broader patterns in language constructs, such as anxiety or suicidal ideation [2]. This enables rather than predefined constructs. Techniques such as researchers to detect subtle emotional and cognitive dynamics probabilistic topic models [17], statistical semantic models [18], that may not be captured with traditional self-report measures, and neural language models [19] capture characteristics ranging making it a powerful complement to other assessment tools. 54 from word co-occurrence and meaning to sequential [4] Marco Spruit, Stephanie Verkleij, Kees de Schepper, and Floortje dependencies. Scheepers. Exploring language markers of mental health in psychiatric stories. Appl. Sci. 12, 4 (Feb. 2022), 1–17. Combining top-down, bottom-up, and qualitative DOI:https://doi.org/10.3390/app12042179 approaches enables a highly nuanced and insightful analysis of [5] Günter K. Schiepek, Kathrin Viol, Wolfgang Aichhorn, Marc-Thorsten Hütt, Katharina Sungler, David Pincus, and Helmut J. Schöller. textual data. This integrated strategy allows researchers not only Psychotherapy is chaotic—(not only) in a computational world. Front. to quantify specific psychological constructs but also to examine Psychol. 8 (Apr. 2017), 379. emergent patterns, contextual nuances, and complex semantic DOI:https://doi.org/10.3389/fpsyg.2017.00379 [6] Ryan L. Boyd, Ashwini Ashokkumar, Sarah Seraj, and James W. structures, providing a comprehensive understanding of Pennebaker. The development and psychometric properties of LIWC-22. language use and its psychological implications [2]. University of Texas at Austin, Austin, TX. https://www.liwc.app [7] Michael A. Cohn, Matthias R. Mehl, and James W. Pennebaker. Linguistic markers of psychological change surrounding September 11, 2001. 4.3 Limitations Psychol. Sci. 15, 10 (Oct. 2004), 687–693. DOI:https://doi.org/10.1111/j.0956-7976.2004.00741.x Interpretation of our findings is limited by the absence of [8] Bernard Rosner, Robert J. Glynn, and Mei-Ling T. Lee. The Wilcoxon information about clients’ diagnoses and annotations regarding signed rank test for paired comparisons of clustered data. Biometrics 62, 1 (Mar. 2006), 185–192. the nature of phase transitions, indicating whether the transition DOI:https://doi.org/10.1111/j.1541-0420.2005.00389.x represents improvement or worsening of symptoms. Other [9] Boban Simonovic, Katia Correa Vione, Edward Stupple, and Alice Doherty. It is not what you think it is how you think: A critical thinking limitations include heterogeneity of participants, contextual intervention enhances argumentation, analytic thinking and metacognitive limitations of LIWC, and the absence of fine-grained temporal sensitivity. Think. Skills Creat. 49 (Jun. 2023), 101362. resolution. DOI:https://doi.org/10.1016/j.tsc.2023.101362 [10] James W. Pennebaker, Cindy K. Chung, Joey Frazee, Gary M. Lavergne, and David I. Beaver. When small words foretell academic success: The case of college admissions essays. PLoS One 9, 12 (Dec. 2014), e115844. 5 DOI:https://doi.org/10.1371/journal.pone.0115844 Conclusion [11] Glen Coppersmith, Mark Dredze, Craig Harman, and Kristy Hollingshead. From ADHD to SAD: Analyzing the language of mental health on Twitter Our research suggests that language shifts hold potential as through self-reported diagnoses. 2015. In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From indicators of psychological change. Understanding these patterns Linguistic Signal to Clinical Reality, CLPsych 2015. June 5, 2015, may provide clinicians with more sensitive indicators of Denver. Association for Computational Linguistics, Denver, CO, 1 – 10. therapeutic progress, offering potential guidance for DOI:https://doi.org/10.3115/v1/W15-1201 interventions, and improving the precision of treatment [12] Kyungil Kim, Seongjik Lee, and Changhwan Lee. College students with monitoring in inpatient psychiatric care. ADHD traits and their language styles. J. Atten. Disord. 19, 8 (Aug. 2015), 687–693. DOI:https://doi.org/10.1177/1087054712452343 [13] Marie Forgeard. Linguistic styles of eminent writers suffering from 6 unipolar and bipolar mood disorder. Creat. Res. J. 20, 1 (Feb. 2008), 81– Future Work 92. DOI:https://doi.org/10.1080/10400410701842094 Future research could implement transformer-based neural [14] Barrett Anderson, Philippe R. Goldin, Keiko Kurita, and James J. Gross. Self-representation in social anxiety disorder: Linguistic analysis of network architectures (e.g., BERT, RoBERTa) to cluster autobiographical narratives. Behav. Res. Ther. 46, 10 (Oct. 2008), 1119 – participants according to symptom trajectories, such as 1125. improvement or deterioration. Analyses could then be conducted DOI:https://doi.org/10.1016/j.brat.2008.07.001 to examine differences in linguistic shifts across clusters. Where [15] Raluca N. Trifu, Bogdan Nemeş, Carolina Bodea-Hategan, and Doina available, results from neural language models could be Cozman. Linguistic indicators of language in major depressive disorder (MDD): An evidence-based research. J. Evid.-Based Psychother. 17, 1 compared with clinical annotations to evaluate prediction (Mar. 2017), 105–128. accuracy. Future studies should aim to link these linguistic [16] Michael L. Birnbaum, Sindhu K. Ernala, A. F. Rizvi, Elizabeth Arenare, patterns more directly to specific mental states, ultimately Anna Van Meter, M. De Choudhury, and J. M. Kane. Detecting relapse in youth with psychotic disorders utilizing patient-generated and patient- supporting the development of clinically relevant interventions contributed digital data from Facebook. NPJ Schizophr. 5, 1 (Dec. 2019), and applications. 17. DOI:https://doi.org/10.1038/s41537-019-0085-9 Acknowledgments [17] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (Mar. 2003), 993–1022. This research was supported by Paracelsus Medical University, [18] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their which also provided access to the clinical dataset utilized in this compositionality. In Proceedings of the 27th International Conference on study. The language of this paper was revised with the assistance Neural Information Processing Systems – Volume 2 (NIPS’13), December 5 - 10, 2013, Lake Tahoe Nevada. Curran Associates Inc., Red Hook, NY, of ChatGPT-5. USA, 3111–3119. [19] Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. References A neural probabilistic language model. J. Mach. Learn. Res. 3 (Mar. 2003), 1137–1155. [1] Evelina Fedorenko, Steven T. Piantadosi, and Edward A. F. Gibson. Language is primarily a tool for communication rather than thought. Nature 630, 8017 (Jul. 2024), 575–586. DOI: https://doi.org/10.1038/s41586-024-07522-w [2] [2] Brendan Kennedy, Ashwini Ashokkumar, Ryan L. Boyd, and Morteza Dehghani. 2022. Text analysis for psychology: Methods, principles, and practices. In Handbook of Language Analysis in Psychology. Morteza Dehghani and Ryan L. Boyd (Eds.). The Guilford Press, New York, NY. [3] Minna Lyons, Nazli D. Aksayli, and Gayle Brewer. Mental distress and language use: Linguistic analysis of discussion forum posts. Comput. Hum. Behav. 87 (Oct. 2018), 207–211. DOI:https://doi.org/10.1016/j.chb.2018.05.035 55 Measuring Therapist–Client Synchrony to Forecast Change Dynamics: EMA-based Protocol Pilot ∗ ∗ Matej Vajda Tine Kolenik Tatjana Rožič matej.vajda@mail.sfu.ac.at Paracelsus Medical University Sigmund Freud University Vienna - Sigmund Freud University Vienna - Salzburg, Austria Ljubljana branch Ljubljana branch tine.kolenik@ccsys.de Ljubljana, Slovenia Ljubljana, Slovenia tatjana.rozic@sf u- ljubljana.si Nuša Kovačević Tojnko Gašper Slapničar Miran Možina Outpatient Mental Health Clinic Jozef Stefan Institute Sigmund Freud University Vienna - Pamina Ljubljana, Slovenia Ljubljana branch Maribor, Slovenia gasper.slapnicar@ijs.si Ljubljana, Slovenia nusa@pamina.si miramozinaslo@gmail.com Günter Schiepek Wolfgang Aichhorn Paracelsus Medical University Paracelsus Medical University Salzburg, Austria Salzburg, Austria guenter.schiepek@ccsys.de w.aichhorn@salk.at Abstract 1 Introduction We examine the feasibility and utility of a therapist–client mon- Mental disorders contribute substantially to the global burden itoring protocol based on Ecological Momentary Assessment of disease. Recent estimates suggest that 1 in 8 people were ∼ (EMA), designed to detect synchrony and forecast change dy- living with a mental disorder in 2019, with sustained growth in namics in routine psychotherapy. Using the Synergetic Naviga- disability-adjusted life years through 2021, underscoring the need tion System (SNS), we combined daily client reports with brief for scalable, higher-resolution care processes [15]. Psychotherapy pre-/post-session questionnaires from therapists and clients. N=7 remains a cornerstone of treatment; beyond specific techniques, (3 therapists, 4 clients) participated over 4–9 weeks, complet- robust evidence points to common therapeutic factors, especially ing daily TPQ-SA surveys and pre/post EMPIS-Q ratings; end- the working alliance—being consistently linked to outcomes [25, of-study evaluations assessed feasibility and user experience. 6]. Feedback-oriented psychotherapy (routine outcome monitor- Usability and perceived data safety were rated highly, while per- ing and in-session process feedback) attempts to surface change ceived usefulness was mixed. Clients often experienced EMA as signals early enough for course corrections, though effect sizes obligatory and of limited immediate value; therapists noted miss- vary by context and tool [4, 14]. Parallel advances in mobile ing alliance items and requested side-by-side access to clients’ sensing and AI (e.g., digital phenotyping, multimodal model- post-session responses. Notification glitches and limited uptake ing) enable dense, real-world measurement and interpretation of of feedback interviews further reduced engagement. Findings change processes at the individual (idiographic) level [3, 9]. indicate that daily and session-based monitoring is feasible, but The broader study we are building toward leverages these its value depends on workflow integration, a stronger relational trends via a digital-twin approach that fuses session audio/video focus, and reliable implementation. The very small sample and and physiology with ecological momentary assessments (EMA) reliance on self-report limit generalizability. Future work will run to detect interpersonal synchrony, forecast tipping points, and a larger feasibility trial, refine questionnaires (including alliance inform just-in-time guidance to therapists [8]. items and paired therapist–client views), and pilot multimodal This paper reports a focused pre-study, an EMA-based proto- synchrony measures (session audio/video and physiology) to- col pilot in three therapists and four clients—primarily to validate ward scalable process–outcome monitoring. instruments, apps, and workflows, and to de-risk the method- ological core for the larger study. Keywords psychotherapy, change dynamics, synergetic navigation system, 2 Related Work protocol, pilot High-frequency, routine monitoring of psychotherapy using the Synergetic Navigation System (SNS) and the Therapy Process Questionnaire (TPQ) has been piloted in several small studies. ∗ Both authors contributed equally to this text. Concept and feasibility papers report that equidistant, daily self- ratings can be integrated into clinical settings with good com- pliance, especially when coupled to regular feedback sessions. Permission to make digital or hard copies of all or part of this work for personal Case-based work in suicide prevention, for example, showed or classroom use is granted without fee provided that copies are not made or near-perfect adherence across a 90-day period and emphasized distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this that structured feedback, rather than the mere act of monitoring, work must be honored. For all other uses, contact the owner /author(s). appears critical for sustained engagement. More recent feasibil- Information Society 2025, Ljubljana, Slovenia ity studies using personalized, daily process items in outpatient © 2025 Copyright held by the owner/author(s). https://doi.org/10.70314/is.2025.cogni.10 populations similarly found high perceived usefulness, alongside 56 Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia Vajda et al. caution about burden over longer durations (ie, signs of response 4.2 Synergetic Navigation System (SNS) fatigue). Together, these pre-studies suggest feasibility in routine Data collection and real-time monitoring were implemented with care, but also highlight the importance of how data are fed back the SNS, a secure, web-based platform that schedules question- to patients and clinicians–so they can adjust focus or goals before naires at arbitrary intervals, supports Likert-type and V/A inputs progress falters or therapy discontinues [19, 23, 5, 12]. across devices, and visualizes raw time-series for therapist and A second stream of pre-studies has tested EMA and feedback client use. It includes built-in analyses enabling process-oriented as an adjunct to therapy. In depressed outpatients, randomized feedback and individualized decision-making [20, 21]. In this trials of experience-sampling with weekly, personalized feedback study we deployed the shortened TPQ for ambulatory use (TPQ- reported that the add-on was both feasible and associated with SA) for clients’ daily reports and custom five-factor pre-/post- symptom improvements versus controls; follow-on protocols session forms for therapists and clients; accounts were protected have focused on pragmatic implementation and personalization. via HT TPS with anonymized usernames and passwords, and Broader EMA reviews in mental health consistently conclude outputs were available for optional feedback discussions [20, 21]. that adherence is acceptable when sampling is purposeful, notifi- cations are reliable, and feedback is built in—while also warning that implementation details (timing, prompt load, and integration 4.3 EMPIS questionnaire (EMPIS-Q) into care) strongly determine perceived value [10, 1, 13]. The therapist pre-/post-session instruments were developed via Finally, dyadic, session-by-session monitoring has been mov- a theory-driven item-generation and expert-consensus workflow ing toward “dual-perspective” designs that track both sides of grounded in Schiepek’s five-factor change model EMPIS [18]. the interaction (e.g., alliance after each session) and complement Concretely, three domain experts independently drafted candi- end-of-treatment measures with qualitative interviews. These date items to operationalize each factor for a pre-session “planned studies foreground the clinical utility of post-session reflections influence” pass and a post-session “realized influence/valence” and alliance tracking—elements our pilot also probes via ther- pass. We then conducted iterative expert panel review to judge apist pre/post 5-factor ratings and client post-session reports content relevance, clarity, and redundancy, reconciling wording alongside daily EMA [16]. by consensus—an approach consistent with standard content- validity procedures (e.g., expert-judge review, or modified Delphi- style consensus). Drafts were piloted internally to check inter- 3 Research Objectives pretability and response burden, and response formats were sim- Overall aim of the larger program: plified to Likert-type scales to fit the session workflow. Because To model psychother- the study operated bilingually, the final Slovenian versions were apeutic change as a nonlinear, dyadic system and to forecast translated and back-translated to secure conceptual equivalence clinically actionable dynamics (e.g., ruptures, sudden gains) by in- before deployment. This sequence aligns with recommended tegrating daily EMA with session-level synchrony and standard- steps in scale development (theory-driven item generation → ized questionnaires, grounded in Schiepek’s five-factor model expert review for content validity small-scale pretest) and → of change (EMPIS: motions, otivation to change, therapeutic e m p with cross-cultural adaptation guidelines [2, 11]. i s rogress/success, nsight, problem everity) [18]. To correlate ob- servable behavioural and physiological signals, alongside textual data (transcripts, diaries), to aforementioned change dynamics, 4.4 Single-item Outcome Measure using state-of-the-art multi-modal deep learning approaches. EPO-1 is a single-item instrument, evaluating the responders’ Specific objectives of this pre-study: i) Feasibility and fi- current emotional and psychological well-being [7]. It is used delity. Verify day-to-day adherence, app stability, and data com- dimensionally with a visual analog scale (0: "Very poorly; I can pleteness for therapist/client EMA and session questionnaires barely manage to deal with things" to 100: "Very well; I have no (therapist pre/post “planned vs. enacted” five-factor ratings; client important complaints"). post-session intervention perceptions); ii) User experience and ethics. Collect end-of-month evaluations from both groups to surface burden, privacy, and workflow issues to remediate before 4.5 Therapy Process Questionnaire – Short scale-up (and align with transparency and equity safeguards) [3]. Ambulatory Use (TPQ-SA) Daily measurements using the TPQ-SA [21, 17] (shortened ver- sion with 24 items for ambulatory use) yield time series data 4 Materials and Methods of psychotherapies that allow for capturing and identifying di- 4.1 versity and complexity of cases, as well as critical instabilities Ecological Momentary Assessment (EMA) and nonstationarities. Unpredictability and complexity of change We used EMA to obtain equidistant, high-frequency measure- processes thus make close monitoring important. ments of clients’ therapy-process states during everyday life. Limiting assessment to in-session ratings risks irregular or low- frequency sampling; in contrast, brief daily EMA increases ecolog- 4.6 Evaluation Questionnaire ical validity and yields time series suitable for detecting nonlinear Upon completion of data collection, tailored evaluation ques- change features and feeding back clinically meaningful signals tionnaires were distributed to therapists and clients to gather [20]. In this pilot, clients completed one short smartphone survey feedback on study participation. Sections covered general re- per day for approximately one month, targeting core state vari- search information (use of personal data, voluntariness, support ables and change-relevant markers aligned with the five-factor availability, and need for additional information or training; for framework. This design supports forecasting/early-warning anal- clients, also how the therapist presented the research objectives), yses and provides material for collaborative reflection in subse- therapist evaluation of pre- and post-session questionnaires (clar- quent sessions [20]. ity, relevance of the five client variables, perceived influence 57 Therapist–Client Change Dynamics: EMA-based Protocol Pilot Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia on session conduct, question quantity, and post-session useful- with moderate perceived usefulness (M=4.0). Therapists asked to ness; with space for comments), client evaluation of post-session add items on alliance, emotions, and session atmosphere; they and daily questionnaires (clarity, contribution to session compre- also requested side-by-side access to clients’ post-session re- hension, relevance, question quantity, response difficulty, utility, sponse. App usability was high (M=4.6); preferences included completion time, app prompt suitability, reference period, mode optional free-text and push notifications. One minor display issue of completion, feedback interview experience if relevant, and was noted (line breaks for long Likert labels). comments), user experience (app/website usability, input prefer- ences, timing, technical issues), and demographics (gender, age 5.2 Evaluation Questionnaire - Clients range). A separate free-text field allowed participants to provide Four clients completed the evaluation (see Table 3). Presentation additional comments to the research team. = = clarity was moderate (purpose 𝑀 3.5, procedure 𝑀 3.75), while perceived data safety was very high (𝑀 5). The daily = 4.7 Participants TPQ-SA was seen as moderately comprehensive (𝑀 = 4.25) and The sample comprised two groups: three therapists, who were moderately difficult (𝑀 3.75) but of limited immediate use-= selected through convenience sampling based on prior knowl- fulness (𝑀 2.75), with comments about obligation and noti-= edge and experience with SNS, and their four clients, who were fication timing; typical completion time was 3–4 minutes. The identified through snowball sampling from therapists’ current post-session questionnaire was rated comprehensive (𝑀 4.5) = caseloads, with inclusion criteria of an established therapeutic al- and moderately helpful (𝑀 4.0), with slightly lower perceived = liance and therapist-assessed likelihood of consent to participate relevance (𝑀 3.75); most clients found the item count appro-= (see Table 1). Some authors also served as therapists in this study. priate. App ease of use was high (𝑀 4.75), though clients noted = Participation was voluntary and was not compensated. All the irregular/missing prompts and requested notification timing con- participants signed an informed consent form. trol; one initial login issue was reported. One feedback interview was conducted and described as yielding no major insights. 4.8 Feedback Interviews In feedback interviews, therapist and client review visualisa- 6 Discussion tions of collected questionnaire data, with the therapist following This preliminary study aimed to evaluate the feasibility and effec- rather than interpreting [22]. tiveness of a session-by-session monitoring system for both ther- apists and clients, in parallel to daily client measurements. The 4.9 Procedure feedback gathered provides valuable insights into the strengths and areas for improvement of this methodology, which is in- EMPIS-Q was adapted for pre- and post-session use, translated tended to inform a larger-scale process-outcome study focusing into Slovenian, reviewed by four authors, and back-translated for on predicting therapeutic change. accuracy. The Slovenian translation of the TPQ-SA was employed. Our findings echo prior pre-studies in three ways. First, feasi- All measures were implemented in the SNS. The visual analogue bility with caveats: like earlier pilots, therapists and clients rated scale was replaced with Likert scales to assess perceived influ- usability highly, yet clients sometimes experienced daily EMA ence and valence, and a free-text field was added for optional as an obligation and usefulness dipped without structured feed- comments. The EPO-1 was later included in the post-session back; prior work shows adherence and perceived value rise when client questionnaire. regular feedback interviews are part of the protocol, something Therapists familiar with SNS were invited (four approached; underused in our pilot (one interview only). Second, content fo- three consented). The research team did not rehearse client pre- cus: therapists’ request to add alliance/emotion-of-the-session sentations or review questionnaires with them, and conducting items mirrors dyadic monitoring protocols that track alliance feedback interviews was recommended (not mandatory). Ther- every session; incorporating these in our post-session set should apists recruited clients, explained the study, obtained written increase clinical relevance. Third, implementation details matter: informed consent, introduced SNS and questionnaires, and for- notification glitches and timing issues we observed are the same warded installation instructions and login credentials provided levers highlighted in EMA literature as determinants of engage- by the research team, which also activated client accounts in ment. In short, our results are consistent with earlier pilots—daily SNS. During data collection, clients received daily smartphone monitoring is workable and accepted, but its utility depends on notifications to complete the TPQ-SA. closing the loop (feedback), tuning item sets to the relational At study completion, EMPIS-Qs were deactivated; however, process, and getting the micro-UX right [12, 5, 16, 13, 24]. therapist–client dyads could continue using TPQ-SA voluntarily (two did). Therapists thanked clients, and the research team ex- pressed appreciation to therapists. Clients and therapists each 7 Limitations This pilot has several limitations. The sample was very small completed separate evaluation questionnaires. (three therapists, four clients) and recruited by convenience/ 5 Results snowball methods, with role overlap (some authors as therapists), limiting generalizability and introducing possible expectancy and 5.1 Evaluation Questionnaire - Therapists social–desirability biases. The observation window was short Three therapists completed the evaluation (see Table 2). Perceived with few sessions per client, and feedback interviews were rarely data safety was high (M=4.6/5). The pre-session instrument was used (one dyad), so acceptability and utility may be underes- rated comprehensive and relevant (both M=4.0) but had mixed timated or mischaracterized. All primary measures were self- impact on session conduct (ratings 2–4), reflecting unpredictable report; some post-session entries were delayed, possibly increas- session topics and overlap among categories. The post-session in- ing recall bias but also possibly affording additional reflective pro- strument scored higher on comprehensiveness/relevance (M=4.6) cessing and thus more considered responses. The session-related 58 Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia Vajda et al. Table 1: Participants and Data Collection Overview Therapists Clients Gender Age Range Gender Age Range Feedback Interviews No. of Sessions Daily TPQ Duration T1 M 35-45 C1 M 35-45 1 5 9 weeks C2 F 25-35 0 4 9 weeks T2 F 45-55 C3 F 45-55 0 5 4 weeks T3 F 35-45 C4 F 25-35 0 4 5 weeks 4 1 18 Table 2: Therapist Evaluation Questionnaire Results Domain M / Rating Insights / Feedback Illustrative Quote(s) General Safety M = 4.6 (1–5) Need for clearer guidance “clearer instructions for presenting to clients’ Experience (T1)’; “support in explaining technical aspects” (T2); “rehearsing client presentations” (T3) Pre-session Comprehensiveness Difficult to predict topics; predefined as- “I was mostly guessing. . . clients came up with Questionnaire M = 4; pects often intertwined; categories not nat- topics that changed the session course.” (T1); Relevance M = 4; ural but influenced focus; suggestion to “I rarely think about the aspects in such a struc- Influence rated 4 and rephrase items (wishes vs. intentions); frus- tured way. . . that’s why it influenced the ses- 2 tration at not knowing client experience sion.” (T2) Post-session Comprehensiveness Number of items appropriate; easier to com- “relational aspect was missing” (T2); “I missed Questionnaire & Relevance M = 4.6; plete than pre-session; items clear but lim- questions for [...] reflection, e.g. regarding al- Usefulness M = 4.0 ited new insight; missed items on alliance, liance or feelings.” (T1); “Sometimes it was emotions, session atmosphere; difficulty cat- difficult to determine exactly in which area egorizing events; interested in client post- something happened for the client.” (T1) session reports User Usability M = 4.6 All used mobile app; Likert scales suitable "It was an extra commitment between an al- Experience for focus; suggestion to add free-text fields; ready tight time-window between sessions." (App) post-session timing added time pressure; (T1); "... notifications [...] would remind me to preference for push notifications; minor fill in the questionnaire..." (T2) technical issue Note. Ratings are on a 1–5 Likert scale (1 = low/poor, 5 = high/excellent). 𝑀 denotes the mean across 𝑁 = 3 therapists. Quotes translated from Slovene. Table 3: Client Evaluation Questionnaire Results Domain M / Rating Insights / Feedback Illustrative Quote(s) General Purpose clarity M = 3.5; Clients felt very safe; C4 unsure about — Experience Procedure clarity right to withdraw; all knew whom M = 3.75; to contact; mixed responses on sup- Safety M = 5 port/presentation prior to study TPQ Comprehensiveness Often experienced as obligation; some-“challenging because I often forgot and solved M = 4.25; times forgotten; completed in 3–4 min- things in hindsight. . . [...] felt some pressure to Difficulty M = 3.75; utes; varied opinions on timing; some complete it.” (C1); “understandable, simple, but Usefulness M = 2.75 items too general/redundant; valued [...] a kind of obligation.” (C3); “I answered all specificity of emotion items; mixed the items in a section the same”; “there weren’t views on item count; one feedback in- any groundbreaking insights.” (C1) terview, no major insights Post-session Comprehensiveness Number of items appropriate (3) or ex-“I wouldn’t say that the items were particularly Questionnaire M = 4.5; cessive (1); items clear but limited use- useful in reflecting on the therapy itself. But Helpfulness M = 4.0; fulness for reflection they were clear.” (C1) Relevance M = 3.75 User App Ease of use Technical issues: missing or irregular “notification. . . appeared exactly the other way Experience M = 4.75 prompts; initial login difficulty (C3); around as it should. . . It would be better if I had desire for more control over notifica- some control over [it].” (C1); "The possibility to tions set the notification time." (C4) Note. Ratings are on a 1–5 Likert scale (1 = low/poor, 5 = high/excellent). 𝑀 denotes the mean across 𝑁 = 4 clients. Quotes translated from Slovene. 59 Therapist–Client Change Dynamics: EMA-based Protocol Pilot Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia instruments underwent content-focused development only and [6] Christoph Flückiger, A C Del Re, Bruce E Wampold, and Adam O Horvath. 2018. The alliance in adult psychotherapy: a meta-analytic synthesis. en. were not psychometrically validated; bilingual translation/back- Psychotherapy (Chic.), 55, 4, (Dec. 2018), 316–340. translation may still leave subtle construct drift. Platform issues [7] Miguel M. Gonçalves et al. 2024. Developing a european psychotherapy (notification irregularities, a display bug for long labels) may consortium (epoc): towards adopting a single-item self-report outcome measure across european countries. , 6, 3, (Sept. Clinical Psychology in Europe have affected adherence. Finally, the study did not include the 2024), 1–15. doi:10.32872/cpe.13827. planned multimodal synchrony streams (audio/video/physiology) [8] Evangelia Katsoulakis et al. 2024. Digital twins for health: a scoping review. or session-level alliance items, constraining insight into barriers , 7, 1, (Mar. 2024), 1–11. doi:https://doi.org/10.1038/s417 npj Digital Medicine 46- 024- 01073- 0. additional data collection methods might introduce. [9] Tine Kolenik. 2022. Methods in digital mental health: smartphone-based assessment and intervention for stress, anxiety, and depression. In Inte- 8 Conclusion and Future Work grating Artificial Intelligence and IoT for Advanced Health Informatics: AI in the Healthcare Sector. Carmela Comito, Agostino Forestiero, and Ester This pre-study primarily assessed feasibility and barriers rather Zumpano, editors. Springer International Publishing, Cham, 105–128. isbn: 978-3-030-91181-2. doi:10.1007/978- 3- 030- 91181- 2_7. than effects. Daily EMA plus brief session questionnaires proved [10] Ingrid Kramer et al. 2014. A therapeutic application of the experience sam- implementable and acceptable, but value depended on work- pling method in the treatment of depression: a randomized controlled trial. flow fit and feedback loops. Key friction points were method- , 13, 1, (Feb. 2014), 68–77. doi:https://doi.org/10.1002/wps.2 World Psychiatry 0090. ological (very small convenience sample with role overlap; self- [11] Mary R. Lynn. 1986. Determination and quantification of content validity. report only; delayed post-session entries), procedural (rare use , 35, 6, (Nov. 1986), 382–386. doi:https://doi.org/10.1097/00 Nursing Research of feedback interviews; pre-session “planned influence” some- 006199- 198611000- 00017. [12] Rosa Michaelis, Friedrich Edelhäuser, Yvonne Hülsner, Eugen Trinka, and times felt guess-like amid emergent themes; lack of explicit Günter Schiepek. 2022. Personalized high-frequency monitoring of a process- alliance/emotion-of-session coverage), and technical (notification oriented psychotherapeutic approach to seizure disorders: treatment utiliza- tion and participants’ feedback. , 59, 4, (Feb. 2022), 629–640. Psychotherapy irregularities; minor display issues). These constraints shaped doi:https://doi.org/10.1037/pst0000430. engagement and data quality as much as the instruments them- [13] Inez Myin-Germeys, Zuzana Kasanova, Thomas Vaessen, Hugo Vachon, selves, underscoring that successful monitoring is a service- Olivia Kirtley, Wolfgang Viechtbauer, and Ulrich Reininghaus. 2018. Ex- perience sampling methodology in mental health research: new insights design problem—stable micro-UX, clear rationale, and structured and technical developments. , 17, 2, (May 2018), 123–132. World Psychiatry feedback, not merely a measurement problem. doi:https://doi.org/10.1002/wps.20513. Future work will be a larger trial focused on de-risking these [14] Ole Karkov Østergård, Hilde Randa, and Esben Hougaard. 2018. The effect of using the partners for change outcome management system as feedback tool barriers. It will prioritize pragmatic endpoints (adherence, timeli- in psychotherapy—a systematic review and meta-analysis. Psychotherapy ness, missingness, prompt reliability, time-to-completion, usabil- , 30, 2, (Sept. 2018), 1–18. doi:https://doi.org/10.1080/10503307.2018 Research .1517949. ity, protocol fidelity) and data-quality safeguards (harmonized [15] The Lancet Psychiatry. 2024. Global burden of disease 2021: mental health scales, timestamp checks to quantify recall lag, basic psychomet- The Lancet Psychiatry messages. , 11, 8, (Aug. 2024), 573. doi:10.1016/S2215- rics for EMPIS-Q). Finally, it will pilot the planned multimodal 0366(24)00222- 0. [16] Yvonne Schaffler, Andrea Jesser, Elke Humer, Katja Haider, Christoph Pieh, streams (session A/V and physiology) strictly for feasibility (con- Thomas Probst, and Brigitte Schigl. 2024. Process and outcome of outpatient sent rates, capture success, clinician burden) before testing prog- psychotherapies under clinically representative conditions in austria: pro- nostic utility in subsequent outcome-focused studies. tocol and feasibility of an ongoing study. Frontiers in psychiatry, 15, (Mar. 2024). doi:https://doi.org/10.3389/f psyt.2024.1264039. [17] Günter Schiepek. 2022. Prozess- und outcome-evaluation mithilfe des syn- Funding ergetischen navigationssystems (sns). German. Psychotherapie-Wissenschaft, 12, 1, 43–56. „Der TPB umfasst für die ambulante Therapie 33 Items (Kurz- This work was partly funded by a Sigmund Freud University Vi- fassung: 24 Items).”. https://www.psychotherapie- wissenschaf t.inf o/article enna internal Initial Funding project grant ( January–May 2025). /view/3969. [18] Günter Schiepek, Benjamin Aas, and Kathrin Viol. 2016. The mathematics of Acknowledgements psychotherapy: a nonlinear model of change dynamics. Nonlinear dynamics, psychology, and life sciences, 20, 3, (July 2016), 369–99. https://pubmed.ncbi .nlm.nih.gov/27262423/. The authors would like to thank the participating clients for their [19] Günter Schiepek, Benjamin Aas, and Kathrin Viol. 2016. The mathematics of time and effort. Nonlinear dynamics, psychotherapy: a nonlinear model of change dynamics. psychology, and life sciences, 20 3, 369–99. https://api.semanticscholar.org References /CorpusID:40177925. [20] Günter Schiepek, Wolfgang Aichhorn, Martin Gruber, Guido Strunk, Egon [1] Jojanneke A. Bastiaansen, Maaike Meurs, Renee Stelwagen, Lex Wunderink, Bachler, and Benjamin Aas. 2016. Real-time monitoring of psychotherapeu- Robert A. Schoevers, Marieke Wichers, and Albertine J. Oldehinkel. 2018. Frontiers in Psychology tic processes: concept and compliance. . doi:10.3389/f Self-monitoring and personalized feedback based on the experiencing sam- psyg.2016.00604. pling method as a tool to boost depression treatment: a protocol of a prag- [21] Günter Schiepek, Wolfgang Aichhorn, and Guido Strunk. 2012. Der therapie- matic randomized controlled trial (zelf-i). , 18, 1, (Sept. 2018). BMC Psychiatry Zeitschrift prozessbogen (tpb)–faktorenstruktur und psychometrische daten. doi:https://doi.org/10.1186/s12888- 018- 1847- z. für Psychosomatische Medizin und Psychotherapie, 58, 3, 257–266. [2] Godfred O. Boateng, Torsten B. Neilands, Edward A. Frongillo, Hugo R. [22] Günter Schiepek, Heiko Eckert, Benjamin Aas, Sebastian Wallot, and Anna Melgar-Quiñonez, and Sera L. Young. 2018. Best practices for developing Integrative psychotherapy: A feedback-driven dynamic systems Wallot. 2016. and validating scales for health, social, and behavioral research: a primer. approach . Hogrefe Publishing GmbH. Frontiers in Public Health, 6, 149, (June 2018). doi:https://doi.org/10.3389/f pu [23] Günter Schiepek, Barbara Stöger-Schmidinger, Helmut Kronberger, Wolf- bh.2018.00149. gang Aichhorn, Leonhard Kratzer, Peter Heinz, Kathrin Viol, Anna Lichtwarck- [3] Pasquale Bufano, Marco Laurino, Sara Said, Alessandro Tognetti, and Danilo Aschoff, and Helmut Schöller. 2019. The therapy process questionnaire - Menicucci. 2023. Digital phenotyping for monitoring mental disorders: sys- factor analysis and psychometric properties of a multidimensional self- tematic review. , 25, (Dec. 2023), e46778. doi:10.2196/46778. J Med Internet Res rating scale for high-frequency monitoring of psychotherapeutic processes. [4] Kim de Jong, Judith M. Conijn, Roisin A.V. Gallagher, Alexandra S. Reshet- Clinical Psychology Psychotherapy , 26, 5, (July 2019), 586–602. doi:https://d nikova, Marya Heij, and Miranda C. Lutz. 2021. Using progress feedback to oi.org/10.1002/cpp.2384. improve outcomes and reduce drop-out, treatment duration, and deteriora- [24] Matej Vajda. 2024. Barriers and facilitators to the introduction of feedback- tion: a multilevel meta-analysis. , 85, (Apr. 2021). Clinical Psychology Review Kairos–Slovenian informed treatment in organisations: a review of research. doi:https://doi.org/10.1016/j.cpr.2021.102002. Journal of Psychotherapy , 18, 3-4. [5] Clemens Fartacek, Günter Schiepek, Sabine Kunrath, Reinhold Fartacek, and [25] Bruce E Wampold. 2015. How important are the common factors in psy- Martin Plöderl. 2016. Real-time monitoring of non-linear suicidal dynam- World Psychiatry chotherapy? an update. en. , 14, 3, (Oct. 2015), 270–277. ics: methodology and a demonstrative case report. , Frontiers in Psychology Volume 7 - 2016. doi:10.3389/f psyg.2016.00130. 60 Towards a Possible Solution of Chalmers’ Hard Problem and to Definitions of Life and Consciousness Marko Vitas † Independent Researcher Laze pri Borovnici 38, Borovnica, 1353 Slovenia vitas.marko83@gmail.com Abstract/Povzetek consciousness. By including quantum particles which do not have classical trajectories, in my definition of consciousness, There is no consensus about what cognition and its emergent which is apparently an emergent property of cognition, the form, consciousness, are. Yet this expanded abstract proposes a solution to the Chalmers’ hard problem may be found in the new definition of consciousness. As many researchers, introduction of additional multiple dimensions. Organisms philosophers and other thinkers believe that life means cognising, (including individual cells) are those who interpret; the of the existing Vitas & Dobovišek definition of life which the process of life. Digital coding might relate to the reduction of postulates that Life is a far from equilibrium self-maintaining dimensions, and it is highly context-dependent, like digital this new definition of consciousness stems from a generalisation interpretation process or semiosis (in the sense of C. Peirce) is chemical system capable of processing, transforming, and coding of analogue protein three-dimensional structures in the accumulating information acquired from the environment. The unidimensional linear, genetic sequences or vice versa expanding new definition includes the thermodynamical aspect as a far from of dimensions after interpretation, translation of linear digital equilibrium system and considers the flow of information from genetic sequences into three-dimensional analogue protein the environment to a conscious system. The new definition of structures. At this point, it is worth mentioning that interpretants consciousness is formulated in a minimal manner; should have the same dimension as analogue structures, simultaneously, it is general enough to cover all emergent forms providing additional information. Each biopolymer is an of cognition, e.g. thinking and rationality. The newly formulated emergent molecule. Evolution gives rise to emergence. definition states that Consciousness is an emergent property of a Undoubtedly, there is an emergence of three-dimensional far from equilibrium system of quantum particles sustained by an structures from linear unidimensional digital sequences. accumulating information acquired from the environment. The from the environment. Adding extra dimensions for the newly proposed definition of consciousness may be of interest to interpretant might shed new light on problems connected with autopoietic system and capable of processing, transforming, and Likewise, consciousness is an interpretant of the signals coming cognitive and computer sciences – and even to the development consciousness, including Chalmers’ hard problem. Yet perhaps a of artificial intelligence. I propose a possible another alternative question worth posing at this point is whether we are not living generalisation by introducing quantum particles to the Vitas & in some sort of hyper-digital world coding for a hyper-analogue Dobovišek definition of life which refining it into a broader world. Could this view present a possible solution to Chalmers’ concept: Life is a far from equilibrium self-maintaining system of hard problem of consciousness? quantum particles capable of processing, transforming, and accumulating information acquired from the environment. A Keywords/Ključne besede question might be posed here, whether we are not therefore encompassing other complex forms of matter which cannot be Definition of Consciousness, Definition of Life, Origins of Life, considered as life. It is here worth mentioning that some authors, Chalmers Hard Problem, Cognition, Far from Equilibrium for instance, consider dusty plasmas from the thermosphere – although they are not self-sustaining – as a fourth state of matter and fourth domain of life, something in between non-living and Acknowledgments/Zahvala living matter. Newly formulated definition of consciousness Sincere thanks to Andrei Igamberdiev for reading the presents a possible solution to Chalmers’ hard problem of manuscript and suggesting valuable clues and constructive comments to enhance its quality. The author is also grateful to Permission to make digital or hard copies of part or all of this work for personal or David H. Wolpert, Jan Karbowski, Pamela Lyon and Andrej classroom use is granted without fee provided that copies are not made or distributed Dobovišsek for insightful comments regarding issues presented for profit or commercial advantage and that copies bear this notice and the full in this Abstract. Thanks also to Arto Annila for reading my citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). articles and everyone who contributed to successful launch the Information Society 2025, 6–10 October 2025, Ljubljana, Slovenia newly proposed definition of consciousness . © 2025 Copyright held by the owner/author(s). https://doi.org/10.70314/is.2025.cogni.1 61 References/Literatura Vitas, M. & Dobovišek, A. (2017) On a quest of reverse translation. Found Chem 19, 139–155. https://doi.org/10.1007/s10698-016-9260-5 Vitas, M. & Dobovišek, A. (2019) Towards a General Definition of Life. Origins of Life and Evolution of Biospheres 49, 77–88. https://doi.org/10.1007/s11084-019-09578-5 Vitas, M. (2025) Towards a Possible Definition of Consciousness. BioSystems 254, 105526. https://doi.org/10.1016/j.biosystems.2025.105526 62 Analiza kognitivnih zmogljivosti LLM: Strateško načrtovanje z uporabo testa Tower of London Evaluating LLM Cognitive Capabilities: A strategic Planning Analysis Using the Tower of London Test Katarina Žužek Matjaž Gams Kognitivna znanost Oddelek za inteligentne sisteme Univerza v Novem mestu Institut "Jožef Stefan" Fakulteta za ekonomijo in informatiko Jamova cesta 39, 1000 Ljubljana Slovenija Slovenija Povzetek Keywords Prispevek raziskuje artificial intelligence, large language models, planning, tower of zmožnosti stratešk ega načrtovanj a velikih jezikovnih modelov (LLM) z uporabo besedilne london, cognitive science različic e testa modelov v času testiranja, in sicer 1 Uvod : DeepSeek V3, Grok 3, Gemini Tower of London (ToL). Preizkušenih je bilo pet najzanimivejših 2.0 Flash, Qwen 235B-A22B in Mistral 12B, na nalogah različnih V zadnjih desetletjih je področje umetne inteligence doživelo arhitekturo in velikostjo, pri čemer so se pri nalogah z visoko posnemati človeško logiko z eksplicitnimi pravili, smo prešli k kognitivno zahtevnostjo pojavile jasne omejitve. Rezultati adaptivnim modelom, ki temeljijo na nevronskih arhitekturah in zahtevnosti. Uspešnost modelov je bila tesno povezana z njihovo izjemen razvoj. Od klasičnih simbolnih pristopov, ki so skušali poudarjajo potencial LLM-jev za približek kognitivnim procesom se učijo iz obsežnih podatkovnih zbirk. Sodobni LLM-ji, podatkov in razvoju temelječi na arhitekturi »Transformer«, so v zadnjem desetletju človeka, obenem pa opozarjajo na potrebo po optimizaciji učnih strateškega bistveno spremenili področje obdelave naravnega jezika [2, 3]. naprednejših arhitektur za izboljšanje kognitivne znanosti in umetne inteligence ter odpira nove Njihova zmožnost generiranja koherentnih in semantično načrtovanja. Raziskava prispeva k povezovanju možnosti za uporabo standardiziranih psiholoških test bogatih besedil je vzbudila zanimanje za njihove kognitivne ov pri ocenjevanju LLM-jev. zmogljivosti, ki presegajo zgolj jezikovno produkcijo in posegajo na področja, kot sta logično sklepanje in strateško Ključne besede načrtovanje. Kljub velikim izboljšavam na jezikovnem področju so LLM-ji še vedno omejeni pri nalogah abstraktnega umetna inteligenca, veliki jezikovni modeli, načrtovanje, Tower razumevanja, vzročno-posledičnega sklepanja in dolgoročnega of London, kognitivna znanost načrtovanja. Te omejitve so še posebej izrazite pri kompleksnih Abstract problemih, ki zahtevajo ohranjanje informacij v delovnem spominu in prilagodljivo mišljenje [4]. Podobno opozarjajo tudi This paper investigates the strategic planning capabilities of large Binz idr. [11], da LLM-ji kljub napredku pogosto odpovedo pri language models (LLM) using a text-based adaptation of the nalogah, ki zahtevajo večstopenjsko strateško načrtovanje in Tower of London (ToL) test. Five of the most relevant at the time posploševanje v nepoznanih situacijah. of testing models were evaluated: DeepSeek V3, Grok-3, Gemini Test »Tower of London«, znan tudi kot »Hanojski stolpi« ali 2.0 Flash, Qwen 235B-A22B, and Mistral 12B, on tasks of »Londonski stolpi«, je uveljavljeno orodje v kognitivni znanosti, varying complexity. Performance was closely tied to model ki omogoča natančno ocenjevanje prostorskega načrtovanja in architecture and size, with clear limitations emerging in highly logičnega sklepanja z nalogami preurejanja krogcev oz. kroglic cognitive tasks. The findings highlight LLMs potential to na treh palicah [2, 9]. V zadnjih preglednih raziskavah je bilo approximate human cognitive processes while emphasizing the poudarjeno, da bo za doseganje višjih kognitivnih sposobnosti need for optimized training data and advanced architectures to enhance planning capabilities. This research bridges cognitive razumevanja [12]. Na Sliki 1 je predstavljeno zaporedje potez ključno povezovanje nevronskih mrež s simbolnimi mehanizmi science and artificial intelligence, opening new avenues for using pri enostavni nalogi te igre. standardized psychological tests to evaluate LLMs. Ta raziskava ocenjuje zmožnosti LLM-jev za strateško Permission to make digital or hard copies of part or all of this work for personal or načrtovanje. Uporabili smo metodološko prilagojeno besedilno classroom use is granted without fee provided that copies are not made or distributed različico testa Tower of London, ki je zasnovana za merjenje for profit or commercial advantage and that copies bear this notice and the full izvršilnih funkcij in strateškega načrtovanja. Glavni cilj citation on the first page. Copyrights for third-party components of this work must raziskave je ugotoviti, kako različni LLM-ji delujejo pri be honored. For all other uses, contact the owner/author(s). Information Society 2025, 6-10 October 2025, Ljubljana, Slovenia reševanju nalog različnih stopenj zahtevnosti in kako njihova © 2025 Copyright held by the owner/author(s). arhitektura vpliva na njihovo uspešnost pri reševanju nalog. https://doi.org/10.70314/is.2025.cogni.2 63 Information Society 2025, 6-10 October 2025, Ljubljana, Slovenia K. Žužek et al. barvne kroglice razporejene takole: Palica 1: [rdeča (zgoraj), zelena (spodaj)], Palica 2: [modra (samostojna)], Palica 3: [prazna]. Ciljna konfiguracija: Palica 1: [modra (zgoraj), zelena (vmes), rdeča (spodaj)], Palica 2: [(prazna)], Palica 3: [(prazna)]. Navedite zaporedje potez za premik kroglic, ki doseže cilj z minimalnim številom korakov, pri čemer upoštevajte pravila: premikanje samo zgornje ali samostojne kroglice, premik ene kroglice na potezo, upoštevanje omejitev kapacitete palic (Palica 1: največ tri kroglice, Palica 2: največ dve, Palica 3: največ ena) in pravil gravitacije (kroglice padajo od zgoraj navzdol)." Zmožnost razumevanja navodil, ustvarjanja pravilnega zaporedja potez in doseganje ciljne konfiguracije z minimalnim številom potez je služila kot primarni kazalnik uspešnosti pri ocenjevanju strateškega načrtovanja [6] [9]. 2.3 Izbira modelov in postopek testiranja Raziskovalni vzorec je obsegal pet sodobnih, arhitekturno raznolikih LLM-jev: DeepSeek V3, Grok-3, Gemini 2.0 Flash, Qwen 235B-A22B in Mistral 12B. Modela DeepSeek V3 in Qwen 235B-A22B, znana po naprednih arhitekturah (npr. mešanica strokovnjakov) in obsežnih učnih podatkih, sta predstavljala zgornji razred zmogljivosti. Grok-3 in Gemini 2.0 Flash sta bila izbrana zaradi visokih hitrosti obdelave in specifičnih optimizacij, medtem ko je bil Mistral 12B vključen kot primer manjšega, a učinkovitega modela. Podoben eksperimentalni pristop so uporabili Xu in sod., ki so ocenjevali izvršilne funkcije umetne inteligence [13] z uporabo Slika 1: Reševanje naloge v testu Tower of London. standardiziranih kognitivnih nalog različnih zahtevnosti. 2 Testiranje je bilo izvedeno z uporabo sedmih nalog testa ToL, Teoretični okvir in metodologija katerih zahtevnost je bila določena s številom minimalnih potez Strateško načrtovanje je ključna izvršilna funkcija, ki vključuje (od 2 do 7). Vsaka naloga je od modelov zahtevala ustvarjanje zaporedje dejanj za dosego zastavljenega cilja. Omogoča zaporedja potez za dosego ciljne konfiguracije kroglic na palicah organizacijo vedenja, oblikovanje strategij ter predvidevanje ob upoštevanju jasno opredeljenih pravil, ki posnemajo prihodnjih stanj, kar je bistveno za učinkovito prilagajanje originalni test ToL [6]. Od modelov se je pričakovalo, da bodo kompleksnim okoljem. Test ocenjuje zmožnosti prostorskega ustvarili zaporedje potez, ki vodi do pravilne rešitve z načrtovanja z manipulacijo kroglic na palicah, pri čemer morajo minimalnim številom korakov. udeleženci najti optimalno pot od začetne do ciljne konfiguracije. Za potrebe te raziskave smo test prilagodili v besedilno obliko, ki Podatki so bili zbrani aprila 2025 v izoliranih sejah brez omogoča standardizirano interakcijo z dodatnega konteksta, da bi zagotovili enotne pogoje za vse LLM-ji. modele. Uspešnost je bila ocenjena z dvema meriloma: 2.1 N pravilnostjo in optimalnostjo. Pravilnost pomeni, ali je bila ačrtovanje kot kognitivna funkcija in Tower of London dosežena ciljna konfiguracija. Optimalnost smo merili glede na to, ali je model nalogo rešil z najmanjšim možnim številom Strateško načrtovanje je ključna izvršilna funkcija, ki vključuje potez, kar kaže na njegovo sposobnost strateškega načrtovanja. zaporedje dejanj za dosego zastavljenega cilja. Omogoča Ta merilo je bil določeno binarno: vrednost 1 je bila dodeljena organizacijo vedenja, oblikovanje strategij ter predvidevanje le, če je model uporabil natanko toliko potez, kot je bilo prihodnjih stanj, kar je bistveno za učinkovito prilagajanje teoretično minimalno za dano nalogo; v nasprotnem primeru kompleksnim okoljem. Test ToL, ki ga je razvil Shallice [10], (tudi pri eni sami dodatni potezi) je bila vrednost 0. Rešitev je ocenjuje te zmožnosti z manipulacijo kroglic na treh palicah. bila ocenjena kot uspešno le, če je bila hkrati pravila in Udeleženci morajo najti optimalno pot od začetne do ciljne optimalna. To pomeni, da je model moral doseči ciljno konfiguracije z minimalnim številom potez ob upoštevanju konfiguracijo z natančno določenim minimalnim številom potez. pravil. Glavna zahteva testa je rešitev naloge z minimalnim Vsako odstopanje je bilo razumljeno kot neuspeh pri celotni številom potez, ob upoštevanju preprostih pravil, kot so nalogi. Rezultati so bili zabeleženi strukturirani Excel tabeli in premikanje ene kroglice na potezo in omejitve kapacitete palic. statistično analizirani s programom SPSS z uporabo t-testa za en Uspešnost pri reševanju testa je povezana s funkcijami delovnega vzorec (za preverjanje uspešnosti nad 50 %) in Kruskal- spomina, prilagodljivostjo razmišljanja in sposobnostjo zaviranja Wallisovega testa za primerjavo uspešnosti med modeli [7]. impulzivnih odločitev [9]. 2.4 Analiza podatkov 2.2 Prilagoditev testa ToL za LLM-je Zbrane podatke o uspešnosti posameznih LLM-jev pri nalogah Prilagoditev testa je bila ključna za omogočanje standardizirane različnih zahtevnosti smo analizirali z uporabo statističnega generiranje besedilnih podatkov. Vsaka naloga vsebuje natančen glede pravilnost ter optimalnost rešitev. Statistična analiza je opis začetne interakcije z modeli, ki so primarno zasnovani za obdelavo in orodja SPSS. Osredotočili smo se na ocenjevanje uspešnosti primer, ena od nalog s petimi premiki je bila oblikovana takole: in ciljne konfiguracije ter pravil za premikanje. Na omogočila primerjavo uspešnosti med modeli in ugotavljanje statistično pomembn ih razlik glede na njihovo arhitekturo in " Reši test ToL na plošč i s tremi navpičnimi palicami , kjer so zahtevnost nalog. 64 Information Society 2025, 6-10 October 2025, Ljubljana, Slovenia K. Žužek et al. Rezultati kažejo, da naraščajoča zahtevnost nalog negativno pristopa, uporabljenega pri testiranju ljudi. Prihodnje raziskave vpliva na uspešnost vseh modelov (p < 0,01). Grok-3 je dosegel bi se lahko osredotočile na hibridne pristope, ki bi združili najvišjo povprečno uspešnost (80,95 %, 17/21), sledita DeepSeek jezikovne V3 (66,67 %, 14/21) in Qwen 235B-A22B (61,90 %, 13/21). Gemini 2.0 Flash (42,86 %, 9/21) in Mistral 12B (33,33 %, 7/21) modele z vizualnimi modeli, da bi preverili njihove zmožnosti sta pokazala nižjo uspešnost, zlasti pri nalogah, ki zahtevajo več reševanja problemov v multimodalnem kontekstu [8]. kot štiri poteze. Noben model ni uspešno rešil naloge s sedmimi Poleg tega bi bilo zanimivo raziskati, kako se LLM-ji odzivajo potezami, kar razkriva omejitve pri obvladovanju visoke na naloge z večjo stopnjo nejasnosti ali nedoločenosti, kar bi kognitivne zahtevnosti. lahko vključilo metode iz področja teorije odločanja ali Kruskal-Wallisov test je potrdil statistično pomembne razlike Bayesovskih mrež. Takšne naloge zahtevajo sposobnost med modeli (H = 22,03, df = 4, p < 0,001). Qwen 235B-A22B in ocenjevanja verjetnosti in izbiranja med alternativami ob DeepSeek V3 sta izkazala večjo odpornost pri zahtevnih nalogah, nepopolnih informacijah, kar je pomembna lastnost napredne kar lahko pripišemo naprednim arhitekturam, kot je mešanica kognicije. S tem bi lahko razširili okvir za vrednotenje umetne modelov in obsežnim učnim podatkom [5]. T-test za en vzorec je inteligence z vidika adaptivnosti in robustnosti v realnih, pokazal, da povprečna uspešnost modelov (57,14 %, SD = 0,49) nestandardnih situacijah [12]. V ta namen novejše raziskave presega referenčni prag 50 % (t(104) = 11,92, p < 0,05), kar kaže predlagajo razvoj kognitivnih arhitektur z zmožnostjo notranje na zmožnost sistematičnega reševanja nalog, čeprav zaostajajo za reprezentacije ciljev in večstopenjskega razmišljanja [14], kar bi človeško uspešnostjo (70–80 %) pri zahtevnih nalogah [1]. lahko omogočilo učinkovitejše strateško načrtovanje v modelih. 3 Rezultati Poleg testa ToL obstajajo številni drugi standardizirani kognitivni testi, kot so Wisconsin Card Sorting Test (WCST), Empirična evalvacija je potrdila statistično pomembno korelacijo Raven Progressive Matrices ali Stroopov test, ki bi jih bilo med arhitekturno zahtevnostjo modelov in njihovo uspešnostjo mogoče prilagoditi za evalvacijo LLM-jev. Ti testi vključujejo pri strateškem načrtovanju. Rezultati so pokazali, da sta modela različne kognitivne domene, od fleksibilnega razmišljanja do Qwen 235B-A22B in DeepSeek V3 dosegla najvišjo povprečno sklepanja in inhibicije, in bi omogočili bolj celostno oceno uspešnost pri nalogah različnih zahtevnosti, kar podpira tezo, da umetne inteligence v primerjavi s človeškimi udeleženci. večji modeli z naprednimi arhitekturami učinkoviteje obvladujejo kompleksne naloge. Pri nalogah, ki so zahtevale 4 do 6 potez, je Nadaljevanje raziskav v tej smeri bi lahko podprlo razvoj bila njihova uspešnost visoka, vendar se je pri najzahtevnejši hibridnih modelov, ki vključujejo tako nevronske kot simbolne nalogi s komponente, kar je v skladu s trenutnimi smernicami za razvoj sedmimi znatno zmanjšala , kar razkriva omejitve trenutnih arhitektur. razložljive in robustne umetne inteligence [11]. Poleg tega bi bilo smiselno vključiti teste analognega sklepanja, kot predlagata Manjši modeli, kot sta Gemini 2.0 Flash in Mistral 12B, so bili Ghosh in Holyoak [15], saj so ti pokazatelji višjega reda učinkoviti pri enostavnejših nalogah (2-3 poteze), vendar so hitro kognicije v modelih in so dobro uveljavljeni v psihologiji. dosegli svoje meje pri večji zahtevnosti. Za nadaljnje raziskave priporočamo razširitev vzorca, vključitev 4 Analiza in diskusija dodatnih vrst nalog (npr. logično-matematične, ustvarjalne) in poglobljeno analizo napak modelov. Pomembni so tudi etični Rezultati raziskave poudarjajo dvojnost zmogljivosti LLM-jev. vidiki, kot so zasebnost podatkov, pravičnost in okoljski vpliv Po eni strani modeli, kot sta DeepSeek V3 in Qwen 235B-A22B, treniranja modelov, so pomembni tudi v kontekstu raziskave kažejo izjemen potencial za reševanje kompleksnih kognitivnih ToL. Uporaba obsežnih podatkov za treniranje modelov lahko nalog, ki presegajo zgolj jezikovno obdelavo. Uspešnost pri vključuje občutljive informacije, npr. podatke iz kognitivnih nalogah srednje zahtevnosti testa ToL nakazuje, da so njihove študij o človeških udeležencih, pristranskost v evalvacijskih arhitekture, podprte z obsežnimi učnimi podatki, omogočajo podatkih lahko izkrivi rezultate pri simulaciji kognitivnih nalog, določeno stopnjo logičnega sklepanja in zmožnostjo strateškega visoka poraba energije pri testiranju modelov pa vpliva na okolje. načrtovanja, ki je v nekaterih primerih primerljiva s človeškimi Ti vidiki zahtevajo skrbno obravnavo in dodatno pozornost [8]. sposobnostmi. Kljub temu ti rezultati predstavljajo le statističen približek človeškim rešitvam v specifičnih okoliščinah in ne 5 Zaključek pomenijo simulacije kognitivnih procesov. To postane še posebej očitno pri kompleksnejših nalogah, ki presegajo zmožnosti Raziskava potrjuje pomemben potencial LLM-jev za simulacijo modelov, saj le-ti delujejo na podlagi verjetnostnih korelacij in ne kognitivnih procesov, vendar so njihove zmožnosti strateškega razumevanja. Po drugi strani raziskava razkriva značilne omejitve načrtovanja še vedno omejene. To je še posebej očitno pri teh modelov. Vsi modeli so odpovedali pri najzahtevnejši nalogi visokonivojskem abstraktnem sklepanju in reševanje ToL s sedmimi potezami, kar izkazuje pomanjkanje sposobnosti nestandardnih problemov, ki zahtevajo večstopenjsko logiko. Za za abstraktno, večstopenjsko razmišljanje in dolgoročno nadaljnji napredek bo ključen razvoj hibridnih pristopov, ki bi načrtovanje. To je verjetno posledica njihove statistične narave, združili statistično moč globokega učenja z bolj simbolnimi in ki temelji na prepoznavanju vzorcev. Modeli se morda soočajo z logičnimi pristopi k sklepanju. sposobnosti obdelave informacij v delovnem spominu, kar Takšna integracija bi omogočila razvoj resnično robustne in "zastajanjem" v lokalnih optimumih in imajo omejene omejuje reševanje kompleksnih problemov razložljive umetne inteligence, ki bi lahko postala učinkovit . partner pri reševanju kompleksnih problemov. Raziskava tako Uporaba besedilnega formata za test ToL, čeprav omogoča predstavlja most med kognitivno znanostjo in umetno inteligenco standardizirano interakcijo z modeli, predstavlja določeno ter odpira vrata za nadaljnjo uporabo standardiziranih omejitev. Proces razumevanja in reševanja nalog v besedilni kognitivnih testov pri vrednotenju naprednih zmogljivosti obliki se lahko bistveno razlikuje od vizualno-prostorskega umetne inteligence. Prihodnje študije bi morale raziskati tudi, kako se učni podatki in arhitekturne inovacije, kot so mehanizmi 65 delovnega spomina, odražajo v sposobnostih strateškega [6] Fimbel, E., Lauzon, S. in Rainville, C. (2009). Performance of humans vs. načrtovanja exploration algorithms on the Tower of London test. PLoS ONE, 4 [8] . (7),e7263.https://pdfs.semanticscholar.org/29f9/e4c7671f7bfd20a487ef9f 913bdac53536a8.pdf. Raziskava, izvedena aprila 2025, je delno zastarela zaradi hitrega [7] Harsa, P., Břeňová, M., Bezdicek, O. in Michalec, J. (2022). Tower of napredka na področju London test - short version. Neurologia i Neurochirurgia polska, 56(3), LLM-jev. Po njej so se pojavili novejši 243–250. https://doi.org/10.5603/PJNNS.a2022.0037. sistemi, kot so GPT-5, Claude 4, Grok-4 in Gemini 2.5, ki [8] Hernández-Orallo, J. 2017. Vrednotenje v umetni inteligenci: Od dosegajo boljše rezultate pri nalogah usmerjenosti v naloge k merjenju zmožnosti. Artificial Intelligence abstraktn ega razmišljanja in kognitivnih izzivih, podobnih testu ToL. Ti dosežki nakazujejo Review, 48(3), 397–447. DOI: https://doi.org/10.1007/s10462-016-9505- 7. potrebo po nadaljnjih raziskavah, da bi bolje razumeli [9] Kaller, C.P., Unterrainer, J.M. in Stahl, C. (2012). Assessing planning zmogljivosti in omejitve sodobnih modelov umetne inteligence. ability with the Tower of London task: Psychometric properties of a structurally balanced problem set. Psychological assessment, 24 (1), 46- Literatura 53. https://doi.org/10.1037/a0025174. [10] Shallice, T. (1982). Specific impairments of planning. Philosophical Transactions of the Royal Society B, 298(1089), 199–209. [1] https://doi.org/10.1098/rstb.19 Boccia, M. idr. 2017. Test Tower of London (ToL) v Italiji: Standardizacija testa ToL v italijanski populaciji. Neurological Sciences, 38(7), 1263 [11] Binz, M., & Schulz, E. (2024). Evaluating Planning and Reasoning in – 1270. DOI: https://doi.org/10.1007/s10072-017-2957-y. Language Models. Nature Machine Intelligence. [2] DOI: 10.1038/s42256-024-00896-1 Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S. V., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., Brynjolfsson, [12] Lake, B. M., Ullman, T. D., & Tenenbaum, J. B. (2024). Symbolic E., Buch, S., Card, D., Castellon, R., Chatterji, N. S., Chen, A. S., Creel, K. reasoning in the age of deep learning. Annual Review of Psychology. A., Davis, J., Demszky, D., Donahue, C. in Liang, P. (2021). On the DOI: 10.1146/annurev-psych-030322-020111 opportunities [13] Xu, Y., et al. (2025). Assessing Executive Function in AI Systems Using and risks of foundation models. https://doi.org/10.48550/arXiv.2108.07258. Cognitive Benchmarks. Cognitive Computation, 17(1). [3] DOI: 10.1007/s12559-025-10200-6 Brown, T. B., Mann, B., Ryder, N. in Amodei, D. (2020). Language models are few-shot learners. https://doi.org/10.48550/arXiv.2005.14165 [14] Creswell, A., Shanahan, M., & Kaski, S. (2025). Cognitive Architectures [4] for Multistep Reasoning in LLMs. Journal of Artificial General Chollet, F. (2019). On the measure of intelligence. https://doi.org/10.48550/arXiv.1911.01547. Intelligence. [5] DOI: 10.2478/jagi-2025-0003 DeepSeek-AI, Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., Ruan, C., Dai, D., Guo, D., Yang, D., Chen, D., [15] Ghosh, A., & Holyoak, K. J. (2025). Analogical Reasoning in Large Ji, D., Li, E., Lin, F., Dai, F., Luo, F. in Pan, Z. (2024). Large language Language Models: Limits and Potentials. Cognitive Science, 49(2). models and cognitive science: A comprehensive review of DOI: 10.1111/cogs.13301 similarities, differences, and challenges. https://doi.org/10.48550/arXiv.24 09.02387. 66 Indeks avtorjev / Author index Aichhorn Wolfgang .......................................................................................................................................................... 48, 52, 56 Bangerl Waltraud ......................................................................................................................................................................... 32 Beris Ayse Nur ............................................................................................................................................................................. 32 Bratko Ivan ................................................................................................................................................................................... 15 Bregant Tina ................................................................................................................................................................................... 7 Bründlmayer Anselm ................................................................................................................................................................... 32 Bušelič Benjamin ......................................................................................................................................................................... 11 Caporusso Jaya ............................................................................................................................................................................. 41 Czernin Klara ............................................................................................................................................................................... 32 Farič Ana ...................................................................................................................................................................................... 15 Gams Matjaž .......................................................................................................................................................................... 21, 63 Jablanovec Andrej ........................................................................................................................................................................ 11 Jamšek Monika ............................................................................................................................................................................. 21 Jordan Marko ............................................................................................................................................................................... 21 Justin Martin ................................................................................................................................................................................. 28 Kolenik Tine ..................................................................................................................................................................... 48, 52, 56 Kovačević Tojnko Nuša ............................................................................................................................................................... 56 Križan Tia..................................................................................................................................................................................... 41 Laczkovics Clarissa ...................................................................................................................................................................... 32 Lodrant Katarina .......................................................................................................................................................................... 32 Melinščak Filip ............................................................................................................................................................................. 32 Mono Louis .................................................................................................................................................................................. 37 Možina Miran ............................................................................................................................................................................... 56 Oprešnik Luka .............................................................................................................................................................................. 41 Pavlinič Renata ............................................................................................................................................................................... 7 Purg Suljič Nina ........................................................................................................................................................................... 11 Repovš Grega ............................................................................................................................................................................... 11 Rožič Tatjana ............................................................................................................................................................................... 56 Scharnowski Frank ....................................................................................................................................................................... 32 Schiepek Günter ............................................................................................................................................................... 48, 52, 56 Schneider Valentin ....................................................................................................................................................................... 32 Šinkovec Patricija ........................................................................................................................................................................... 7 Slana Ozimič Anka....................................................................................................................................................................... 11 Slapničar Gašper .......................................................................................................................................................................... 56 Smodiš Rok ............................................................................................................................................................................ 21, 48 Šonc Oskar ................................................................................................................................................................................... 48 Steyrl David ................................................................................................................................................................................. 32 Šutar Mateja ................................................................................................................................................................................. 52 Trpin Borut ................................................................................................................................................................................... 28 Vajda Matej .................................................................................................................................................................................. 56 Vitas Marko .................................................................................................................................................................................. 61 Žužek Katarina ............................................................................................................................................................................. 63 67 Kognitivna znanost Cognitive Science Uredniki l Editors: Anka Slana Ozimič Borut Trpin Toma Strle