SYNTAXFEST 2025 5 Events for 1 Fest of Empirical Syntax Ljubljana, 26 to 29 August 2025 Conference Guide SyntaxFest 2025 5 Events for 1 Fest of Empirical Syntax Ljubljana, 26 to 29 August 2025 Edited by: Kaja Dobrovoljc and Luka Terčon Layout: Jure Preglau Front page logo by: Kim Gerdes Published by: Založba Univerze v Ljubljani (University of Ljubljana Press) Issued by: Znanstvena založba Filozofske fakultete Univerze v Ljubljani (University of Ljubljana Press, Faculty of Arts) For the publisher: Gregor Majdič, Rector of the University of Ljubljana For the issuer: Mojca Schlamberger Brezar, Dean of the Faculty of Arts, University of Ljubljana Ljubljana, 2025 First e-edition. Digital copy of the book is available at: https://ebooks.uni-lj.si/zalozbaul/ DOI: 10.4312/9789612976361 Kataložni zapis o publikaciji (CIP) pripravili v Narodni in univerzitetni knjižnici v Ljubljani COBISS.SI-ID 245972739 ISBN 978-961-297-636-1 (PDF) SyntaxFest 2025 5 Events for 1 Fest of Empirical Syntax Ljubljana, 26 to 29 August 2025 Conference Guide Venue: Faculty of Law, University of Ljubljana (Poljanski nasip 2, Ljubljana) CONTENTS Introduction 5 Conference schedule 6 18th International Conference on Parsing Technologies 13 8th Universal Dependencies Workshop 18 8th International Conference on Dependency Linguistics 31 23rd Workshop on Treebanks and Linguistic Theories 42 3rd Workshop on Quantitative Syntax 54 Organization 67 Support 73 4 INTRODUCTION We are delighted to welcome you to SyntaxFest 2025 in Ljubljana, Slovenia. Continuing the tradition of previous editions in Paris (2019), Sofia (2021), and Washington DC (2023), SyntaxFest 2025 unites five independent yet closely connected events under one roof: the 18th International Conferen-ce on Parsing Technologies (IWPT 2025), the 8th Universal Dependencies Workshop (UDW 2025), the 8th International Conference on Dependency Linguistics (DepLing 2025), the 23rd Workshop on Treebanks and Linguistic Theories (TLT 2025), and the 3rd Workshop on Quantitative Syntax (QU-ASY 2025). Two pre-conference workshops organised by the COST Acti-on CA21167 Universality, Diversity and Idiosyncrasy in Language Technology (UniDive) are also held in conjunction with the main event. These events share a common focus on using corpora and treebanks to study syntax from both theoretical and computational perspectives, with growing emphasis on multilingual and cross-linguistic contexts. In additi-on to Slovenia being a fitting example of a small language community that has benefited from the openness, collaboration, and multilingual outlook fostered by this research community, hosting SyntaxFest in Ljubljana feels especially meaningful as the city was once home to the French linguist Lu-cien Tesnière (1893–1954), whose pioneering work laid the foundations of dependency grammar. Against this backdrop, we extend our thanks to all who have contributed to making SyntaxFest 2025 possible. We are deeply grateful to the authors for bringing new ideas, analyses, and resources to the table; to our excep-tional keynote speakers for taking the time to share their expertise and vision; and to the reviewers for their thoughtful work in shaping a high--quality program. Most importantly, we thank the workshop chairs for jo-ining forces in this unique collaborative format, which has made it a truly community-driven effort. Finally, we also thank our organising institutions for their support, our spon-sors for making the event possible, the ACL Anthology for ensuring the proceedings are openly accessible, and our fellow organising committee members for their dedication behind the scenes. Kaja Dobrovoljc Chair of the SyntaxFest 2025 Local Organising Committee 5 CONFERENCE SCHEDULE All conference sessions will take place at the Faculty of Law, University of Ljubljana (Poljanski nasip 2, Ljubljana). Talks will be held in the Red Hall, while poster sessions and coffee breaks will be organized in the lobby in front of the Red Hall. The following pages provide an overview of the daily program, including workshops, keynotes, and social events. For the most up-to-date version, visit the online program: syntaxfest.github.io/syntaxfest25/programme. html. TUESDAY, 26 August 2025 Session 1 14:00 – 14:30 Conference Opening Chair: Kaja Dobrovoljc 14:30 – 15:20 Keynote: Isabel Papadimitriou (Harvard University) What Can We Learn from Language Models? Chair: Stephan Oepen 15:20 – 15:50 Coffee Break Session 2 – IWPT Chair: Miryam de Lhoneux 15:50 – 16:10 Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs Hiroshi Matsuda, Chunpeng Ma, Masayuki Asahara 16:10 – 16:30 An Efficient Parser for Bounded-Order Product-Free Lambek Categorial Grammar via Term Graph Jinman Zhao, Gerald Penn 16:30 – 16:50 Crosslingual Dependency Parsing of Hawaiian and Cook Islands Māori using Universal Dependencies Gabriel H. Gilbert, Rolando Coto-Solano, Sally Akevai Nicholas, Lauren Houchens, Sabrina Barton, Trinity Pryor 16:50 – 17:10 CCG Revisited: A Multilingual Empirical Study of the Kuhlmann- Satta Algorithm Paul He, Gerald Penn 17:10 – 17:30 High-Accuracy Transition-Based Constituency Parsing John Bauer, Christopher D Manning Group Photo (Staircase) 17:30 – 20:00 Welcome Reception 6 WEDNESDAY, 27 August 2025 Session 3 – UDW Chair: Gosse Bouma 09:00 – 09:50 Keynote: Miryam de Lhoneux (KU Leuven) Typologically informed NLP evaluation 09:50 – 10:10 TreEn: A Multilingual Treebank Project on Environmental Discourse Adriana Silvina Pagano, Patricia Chiril, Elisa Chierchiello, Cristina Bosco 10:10 – 10:30 Crossing Dialectal Boundaries: Building a Treebank for the Dialect of Lesbos through Knowledge Transfer from Standard Modern Greek Stavros Bompolas, Stella Markantonatou, Angela Ralli, Antonios Anastasopoulos 10:30 – 11:00 Coffee Break Session 4 – UDW Chair: Bruno Guillaume 11:00 – 11:15 Negation in Universal Dependencies Jamie Yates Findlay, Dag Trygve Truslew Haug 11:15 – 11:30 A UD Treebank for Bohairic Coptic Amir Zeldes, Nina Speransky, Nicholas E. Wagner, Caroline T. Schroeder 11:30 – 11:45 Annotation of Relative Forms in the Egyptian-UJaen Treebank Roberto A. Diaz Hernandez, Daniel Zeman 11:45 – 12:00 MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs Jaap Jumelet, Leonie Weissweiler, Arianna Bisazza 12:00 – 12:15 Universal Dependencies for Suansu Jessica K. Ivani, Kira Tulchynska 12:15 – 12:30 Building UD Cairo for Old English in the Classroom Lauren Levine, Junghyun Min, Amir Zeldes 12:30 – 14:00 Lunch (Cafeteria) Session 5 – UDW Chair: Dag Haug 14:00-14:15 ShUD: the First Shanghainese Universal Dependency Treebank Qizhen Yang 14:20-14:35 Parallel Universal Dependencies Treebanks for Turkic Languages Arofat Akhundjanova, Furkan Akkurt, Bermet Chontaeva, Soudabeh Eslami, Cagri Coltekin 14:35-14:50 Towards better annotation practices for symmetrical voice in Universal Dependencies Andrew Thomas Dyer, Colleen Alena O'Brien 7 14:50 – 15:10 Annotating Second Language in Universal Dependencies: a Review of Current Practices and Directions for Harmonized Guidelines Arianna Masciolini, Aleksandrs Berdicevskis, Maria Irena Szawerna, Elena Volodina 15:10 – 15:30 Reference and Modification in Universal Dependencies Joakim Nivre, William Croft 15:30 – 16:00 Coffee Break Session 6 – Joint Poster Session A Chair: Cagri Coltekin 16:00 – 16:30 Lightning talks (2 min per poster) Location: Red Hall 16:30 – 17:30 Poster Session Location: Lobby • UD Treebanks for Esperanto as a natural language Masanori Oya • UD-English-CHILDES: A Collected Resource of Gold and Silver Universal Dependencies Trees for Child Language Interactions Xiulin Yang, Zhuoxuan Ju, Lanni Bu, Zoey Liu, Nathan Schneider • Universal Dependencies for Sindhi John Bauer, Sakeena Shah, Muhammad Shaheer, Mir Afzal Ahmed Talpur, Zubair Sanjrani, Sarwat Qureshi, Shafi M Pirzada, Christopher D Manning, Mutee U Rahman • Universal Dependencies Treebank for Khoekhoe (KDT) Kira Tulchynska, Sylvanus Job, Alena Witzlack-Makarevich • Extending the Enhanced Universal Dependencies – addressing subjects in pro-drop languages Magali Sanches Duran, Elvis A. de Souza, Maria das Graças Volpe Nunes, Adriana Silvina Pagano, Thiago A. S. Pardo • Developing a Universal Dependencies Treebank for Alaskan Gwich’in Matthew Kirk Andrews, Cagri Coltekin • Quid verbumst? Applying a definition of word to Latin in Universal Dependencies Flavio Massimiliano Cecchini • Introducing KIParla Forest: seeds for a UD annotation of interactional syntax Ludovica Pannitto, Eleonora Zucchini, Silvia Ballarè, Cristina Bosco, Caterina Mauri, Manuela Sanguinetti • Word Order Variation in Spoken and Written Corpora: A Cross-Linguistic Study of SVO and Alternative Orders Nives Hüll, Kaja Dobrovoljc • A morpheme-based treebank for Gbaya, an Ubanguian language of Central Africa Roulon-Doko Paulette, Sylvain Kahane, Bruno Guillaume • UD Annotation of Experience Clauses in Tigrinya Michael Gasser, Nazareth Amlesom Kifle 18:00 Guided Tour of Ljubljana 8 THURSDAY, 28 August 2025 Session 7 – DepLing Chair: Joakim Nivre 09:00 – 09:50 Keynote: Dan Zeman (Charles University, Prague) Auxiliaries across Languages and Frameworks 09:50 – 10:10 A corpus-driven description of OV order in Archaic Chinese Qishen WU, Santiago Herrera, Pierre Magistry, Sylvain Kahane 10:10 – 10:30 Periphrastic Verb Forms in Universal Dependencies Lenka Krippnerová, Daniel Zeman 10:30 – 11:00 Coffee Break Session 8 – DepLing Chair: Bruno Guillaume 11:00 – 11:15 Tracing Syntactic Complexity: Exploring the Evolution of Average Dependency Length Across Three Centuries of Scientific English Marie-Pauline Krielke, Diego Alves, Luigi Talamo 11:15 – 11:30 Modeling Syntactic Dependencies in Southern Dutch Dialects Loic De Langhe, Jasper Degraeuwe, Melissa Farasyn, Veronique Hoste 11:30 – 11:45 Assessing the Agreement Competence of Large Language Models Alba Táboas García, Leo Wanner 11:45 – 12:00 Genre Variation in Dependency Types: A Two-Level Genre Analysis Using the Czech National Corpus Xinying Chen, Miroslav Kubát 12:00 – 12:15 Distance and Projectivity as Predictors of Sentence Acceptability in Free Word Order Languages Kirill Chuprinko, Artem Novozhilov, Arthur Stepanov 12:15 – 12:30 Head-initial and head-Final coordinate structures in two annotation schemes of dependency grammar Timothy John Osborne, Chenchen Song 12:30 – 14:00 Lunch (Cafeteria) Session 9 – TLT Chair: Heike Zinsmeister 14:00 – 14:50 Keynote: Amir Zeldes (Georgetown University) Subject prominence revisited: What makes entities salient? 14:50 – 15:10 Legal-CGEL: Analyzing Legal Text in the CGELBank Framework Brandon Waldon, Micaela Wells, Devika Tiwari, Meru Gopalan, Nathan Schneider 15:10 – 15:30 Status of morphosyntactic features Illustration with written and spoken French UD treebanks Sylvain Kahane, Bruno Guillaume, Léna Brun, Simeng Song 15:30 – 16:00 Coffee Break 9 Session 10: Joint Poster Session B Chair: Stefanie Dipper 16:00 – 16:30 Lightning talks (2 min per poster) Location: Red Hall 16:30 – 17:30 Poster Session Location: Lobby • Universal Dependencies for the Alemannic Alsatian Dialects Barbara Hoff, Nathanaël Beiner, Delphine Bernhard • Expanding the Universal Dependencies Ancient Hebrew Treebank with Constituency Data Daniel G. Swanson • Graph Databases for Fast Queries in UD Treebanks Niklas Deworetzki, Peter Ljunglöf • Segmentation of Sino-origin words to enhance the representation of Korean and Japanese in S/UD-format treebanks Raoul Blin, Jinnam Choi • A New Hebrew Universal Dependency Treebank: The First Treebank of Post-Rabbinic Historical Hebrew Rachel Tal, Shlomit Fuchs, Orly Albeck, Elisheva Brauner, Yitzchak Lindenbaum, Ephraim Meiri, Avi Shmidman • Universal Dependency Treebank for a low-resource Dardic Language: Torwali Naeem Uddin, Daniel Zeman • Syntax of referents of relative markers: Evidence from a corpus of learner English Izabela Czerniak, Debopam Das • A Typology of Non-Projective Patterns in Unas's and Teti's Pyramid Texts Roberto A. Diaz Hernandez • Dependency Analysis of Chinese Comparative Sentences Zexin Liu • Dative alternations in less-researched syntactic patterns of standard Croatian Matea Andrea Birtić, Siniša Runjaić, Robert Sviben • A Quantitative Study of Subject-Predicate-Object Word Class Composition in vernacular Chinese Based on Dependency Grammar Bingli Liu, Yiyi Zhao • Syntactic units and their length distributions: A case study in Czech Michaela Nogolová, Michaela Koščová, Jan Macutek, Radek Cech • Modeling the Law of Abbreviation in Classical, Modern, and ChatGPT-Generated Chinese: A Power-Law Analysis of Structural Economy Jianwei Yan, Heng Chen 19:00 Conference Dinner 10 FRIDAY, 29 August 2025 Session 11 – TLT Chair: Amir Zeldes 09:00 – 09:20 ComparaTree: A Multi-Level Comparative Treebank Analysis Tool Luka Terčon, Kaja Dobrovoljc 09:20 – 09:40 Metaphorical Heads and Literal Dependents: Syntactic Properties of Metaphors in German Stefanie Dipper 09:40 – 10:00 Automatic Evaluation of Linguistic Validity in Japanese CCG Treebanks Asa Tomita, Hitomi Yanaka, Daisuke Bekki 10:00 – 10:15 Annotation of Chinese Light Verb Constructions within UMR Jingyi Li, Jin Zhao, Nianwen Xue, Shili Ge 10:15 – 10:30 STARK: A Toolkit for Dependency (Sub)Tree Extraction and Analysis Luka Krsnik, Kaja Dobrovoljc 10:30 – 11:00 Coffee Break Session 12 – QUASY Chair: Xinying Chen 11:00 – 11:50 Keynote: Xiaofei Lu (The Pennsylvania State University) The rhetorical and pragmatic functions of syntactically complex structures in academic and second language writing 11:50 – 12:10 On the Flatness, Non-linearity, and Branching Direction of Natural Language and Random Constituency Trees: Analyzing Structural Variation within and across Languages Taiga Ishii, Yusuke Miyao 12:10 – 12:30 Extraction of Contrastive Rules from Syntactic Treebanks: A Case Study in Romance Languages Santiago Herrera, Ioana-Madalina Silai, Bruno Guillaume, Sylvain Kahane 12:30 – 14:00 Lunch (Cafeteria) Session 13 – QUASY Chair: Jianwei Yan 14:00 – 14:15 A Quantitative Study of Syntactic Complexity across Genres: Dependency Distance in English and Chinese Yaqin Wang 14:15 – 14:30 Syntactic Complexity in L2 Reading: A Comparison of Adapted and Original Czech Texts Žaneta Stiborská, Michaela Nogolová, Xinying Chen, Miroslav Kubát 14:30 – 14:45 First Insights into the Syntax of Slovene Student Writing: A Statistical Analysis of Šolar 3.0 vs. Učbeniki 1.0 Tina Munda, Špela Arhar Holdt 11 14:45 – 15:00 Subject-Verb Agreement Alternations in Spanish Pseudopartitive Constructions: A Corpus Study Marina Cerebrinsky 15:00 – 15:15 A Computational Method for Analyzing Syntactic Profiles: The Case of the ELEXIS-WSD Parallel Sense-Annotated Corpus Jaka Čibej 15:15 – 15:30 Syntactic Complexity and News Credibility in Czech Media Miroslav Kubát, Xinying Chen, Michaela Nogolová, Michal Místecký 15:30 – 16:00 Coffee Break Session 14: Joint Poster Session C Chair: Miroslav Kubát 16:00 – 16:20 Lightning talks (2 min per poster) Location: Red Hall 16:20 – 17:20 Poster Session Location: Lobby • Degree centrality as a measure of robustness of dependency structures of the sentences in a large-scale learner corpus of English Masanori Oya • Application of Existing Readability Methods to the Ukrainian Language: A Comprehensive Study Serhii D Prykhodchenko, Oksana Yu. Prykhodchenko • The Interplay of Noun Phrase Complexity and Modification Type in Scientific Writing Isabell Landwehr • Predictability Effects of Spanish-English Code-Switching: A Directionality and Part of Speech Analysis Josh Higdon, Valeria Pagliai, Zoey Liu • Do Multilingual Transformers Encode Paninian Grammatical Relations? A Layer-wise Probing Study Akshit Kumar, Dipti Sharma, Parameswari Krishnamurthy • «Are you Afraid of Ghosts?» A Proposal for Busting Predicate Ellipsis in Universal Dependencies Claudia Corbetta, Federica Iurescia, Marco Carlo Passarotti • Case Syncretism in Kasavakan Puyuma: A Field Data Analysis of Noun Phrase Markers Deborah Watty, Yung-Jui Yao, Jens N. Watty • How to Create Treebanks without Human Annotators -- An Indigenous Language Grammar Checker for Treebank Construction Linda Wiechetek, Flammie A Pirinen, Maja Lisa Kappfjell • An intonosyntactic treebank for spoken French: What is new with Rhapsodie? Maria Paz Botero-Garcia, Emmett Strickland, Bruno Guillaume, Sylvain Kahane, Anne Lacheret-Dujour 17:20 – 17:30 Closing Session 18:00 - 18:30 Guided Tour of National and University Library 12 IWPT 2025 18th International Conference on Parsing Technologies Abstracts Keynote Isabel Papadimitriou (Harvard University) What Can We Learn from Language Models? This talk will examine how linguistic theory can benefit from the recent sur-prising successes of language models in modeling human language pro-duction. Language models provide linguists with an unprecedented em-pirical tool to expand and test our theoretical hypotheses about language. I will go over two main methodologies for taking advantage of language models as an empirical tool. Firstly, examining language model internals as functional theories for how linguistic information can be represented in ways that lead to linguistic capabilities. Secondly, using model training as an empirical testbed, examining what kinds of environments make statis-tical language learning possible or harder. Both methodologies showcase the importance of developing empirical paradigms that narrow the gap between computational methods and linguistic concerns in order to make language models able to help us expand theoretical horizons. 14 Hiroshi Matsuda, Chunpeng Ma, Masayuki Asahara Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs Recent advances in large language models (LLMs) have enabled impressive performance in various tasks. However, standard prompting often strug-gles to produce structurally valid and accurate outputs, especially in de-pendency parsing. We propose a novel step-by-step instruction strategy, where universal part-of-speech tagging precedes the prediction of syntac-tic heads and dependency labels, and a simplified CoNLL-U like output for-mat, our method achieves state-of-the-art accuracy on Universal Depend-encies datasets across 17 languages without hallucination or contamination. We further show that multilingual fine-tuning simultaneously improves cross-language generalization performance. Our results highlight the ef-fectiveness of explicit reasoning steps in LLM-based parsing and offer a scalable, format-consistent alternative to bracket-based approaches. Jinman Zhao, Gerald Penn An Efficient Parser for Bounded-Order Product-Free Lambek Categorial Grammar via Term Graph Lambek Categorial Grammar (LCG) parsing has been proved to be an NP-complete problem. However, in the bounded-order case, the complex-ity can be reduced to polynomial time. Fowler (2007) first introduced the term graph, a simple graphical representation for LCG parsing, but his algo-rithm for using it remained largely inscrutable. Pentus (2010) later proposed a polynomial algorithm for bounded-order LCG parsing based on cyclic lin-ear logic, yet both approaches remain largely theoretical, with no open-source implementations available. In this work, we combine the term-graph representation with insights from cyclic linear logic to develop a novel pars-ing algorithm for bounded-order LCG. Furthermore, we release our parser as an open-source tool. 15 Gabriel H. Gilbert, Rolando Coto-Solano, Sally Akevai Nicholas, Lauren Houchens, Sabrina Barton, Trinity Pryor Crosslingual Dependency Parsing of Hawaiian and Cook Islands Māori using Universal Dependencies This paper presents the first Universal Dependency (UD) treebank for 'Ōlelo Hawai'i (Hawaiian). We discuss some of the difficulties in describing Hawaiian grammar using UD, and train models for automatic parsing. We also combined this data with UD parses from another Eastern Polynesian language, Cook Islands Māori, to train a crosslingual Polynesian parser us-ing UDPipe2. The crosslingual parser produced a statistically significant improvement of 2.4% in the labeled attachment score (LAS) when parsing Hawaiian, and this improvement didn’t produce a negative impact in the LAS of Cook Islands Māori. We will use this parser to accelerate the linguis-tic documentation of Hawaiian. Paul He, Gerald Penn CCG Revisited: A Multilingual Empirical Study of the Kuhlmann-Satta Algorithm We revisit the polynomial-time CCG parsing algorithm introduced by Ku-hlmann & Satta (2014), and provide a publicly available implementation of it. We evaluate its empirical performance against a naive CKY-style pars-er across the Parallel Meaning Bank (PMB) corpus. While the fast parser is slightly slower on average, relative to the size of the PMB, but the trend improves as a function of sentence length, and the PMB is large enough to witness an inversion. Our analysis quantifies this crossover and highlights the importance of derivational context decomposition in practical parsing scenarios. 16 John Bauer, Christopher D Manning High-Accuracy Transition-Based Constituency Parsing Constituency parsers have improved markedly in recent years, with the F1 accuracy on the venerable Penn Treebank reaching 96.47, half of the error rate of the first transformer model in 2017. However, while dependency parsing frequently uses transition-based parsers, it is unclear whether tran-sition-based parsing can still provide state-of-the-art results for constitu-ency parsing. Despite promising work by Liu and Zhang in 2017 using an in-order transition-based parser, recent work uses other methods, mainly CKY charts built over LLM encoders. Starting from previous work, we im-plement self-training and a dynamic oracle to make a language-agnostic transition-based constituency parser. We test on seven languages; using Electra embeddings as the input layer on Penn Treebank, with a self-training dataset built from Wikipedia, our parser achieves a new SOTA F1 of 96.61. 17 UDW 2025 8th Universal Dependencies Workshop Abstracts Keynote Miryam de Lhoneux (KU Leuven) Typologically informed NLP evaluation NLP has a long history of focusing mainly on English. While increasing ef-forts are being made towards making language technology more multilin-gual, English remains the language on which NLP technology is developed first, and applied to other languages next, which inevitably leads to degrad-ed performance compared to English. This talk is about reversing this trend and putting multilinguality at the core of NLP, rather than at the periphery. I describe how typology can inform NLP evaluation, using our recently pro-posed language sampling framework. A strong limitation of the approach is the state of multilingual datasets, which tend to lack coverage, be ma-chine-translated or have questionable quality. UD is an exception, and I em-phasize the role it can play in establishing best practices in multilingual NLP evaluation. 19 Adriana Silvina Pagano, Patricia Chiril, Elisa Chierchiello, Cristina Bosco TreEn: A Multilingual Treebank Project on Environmental Discourse The increasing complexity of environmental discourse is directly propor-tional to the growing complexity of environmental debates present today in all communication media. While linguistic and communication studies have been pursued on this discourse, the development of computational linguistic tools and resources dedicated to support its analysis and interpre-tation is still very incipient. For one, no morphosyntactic resources specific to the environmental domain can be found on major platforms and repos-itories. This paper introduces TreEn, a multilingual treebank project in pro-gress which compiles texts on environmental discourse produced in dif-ferent conversational and communication contexts. In particular, it reports on the parallel component of the project and discusses issues faced during sentence-level alignment between original and translated texts, annotation of texts following UD guidelines, and labeling entities drawing on an on-tology of environmental-related topics. This novel resource is expected to support environmental discourse analysis by providing morphological and syntactical data to enable cross-language and cross-cultural comparison based on the semantics of the entities annotated in the treebank. 20 Stavros Bompolas, Stella Markantonatou, Angela Ralli, Antonios Anasta-sopoulos Crossing Dialectal Boundaries: Building a Treebank for the Dialect of Lesbos through Knowledge Transfer from Standard Modern Greek This paper presents the first treebank for the dialect of Lesbos, a low-re-source living Northern variety of Modern Greek (MG), annotated according to the Universal Dependencies (UD) framework. So far, the only dialectal treebank available for Greek developed with cross-dialectal knowledge transfer is an East Cretan one, which belongs to the same Southern branch as Standard Modern Greek (SMG). Our study investigates the effective-ness of cross-dialectal knowledge transfer between dialectologically less similar varieties of the same language by leveraging knowledge from SMG to annotate the Northern dialect of Lesbos. We describe the annotation process, present the resulting treebank, inject additional linguistic knowl-edge to enhance the results, and evaluate the effectiveness of cross-dia-lectal knowledge transfer for active annotation. Our findings contribute to a better understanding of how dialectal variation within language families affects knowledge transfer in the UD framework, with implications for oth-er low-resource varieties. 21 Jamie Yates Findlay, Dag Trygve Truslew Haug Negation in Universal Dependencies In this paper we study the representation of negation in UD treebanks. We show that the existing annotations are often inconsistent with the guide-lines and that there are ill-motivated differences in annotation of construc-tions across and even within languages. Moreover, we argue that even if the annotation of the two negation-related features (Polarity=Neg and PronType=Neg) were consistent, these two features would be inadequate for straightforwardly expressing the semantics of negation because they relate to the word level only and hence to form rather than meaning. We therefore propose to add two features, Negated=+ and DoubleNegated=+, which directly encode when a predicate is semantically under negation, and thereby allow a straightforward semantic interpretation of a UD parse in terms of negation. Amir Zeldes, Nina Speransky, Nicholas E. Wagner, Caroline T. Schroeder A UD Treebank for Bohairic Coptic Despite recent advances in digital resources for other Coptic dialects, espe-cially Sahidic, Bohairic Coptic, the main Coptic dialect for pre-Mamluk, late Byzantine Egypt, and the contemporary language of the Coptic Church, remains critically under-resourced. This paper presents and evaluates the first syntactically annotated corpus of Bohairic Coptic, sampling data from a range of works, including Biblical text, saints’ lives and Christian ascetic writ-ing. We also explore some of the main differences we observe compared to the existing UD treebank of Sahidic Coptic, the classical dialect of the lan-guage, and conduct joint and cross-dialect parsing experiments, revealing the unique nature of Bohairic as a related, but distinct variety from the more often studied Sahidic. 22 Roberto A. Diaz Hernandez, Daniel Zeman Annotation of Relative Forms in the Egyptian-UJaen Treebank Relative forms are a distinctive morphosyntactic feature of Earlier Egyp-tian. They pose a challenge when annotating them according to the Uni-versal Dependencies approach. They are adjective finite verb forms, and therefore they have both verb and adjective properties, but they can also be used as nouns. The aim of this paper is to discuss the morphosyntactic methodology applied to their annotation in the Egyptian-UJaen treebank. Jaap Jumelet, Leonie Weissweiler, Arianna Bisazza MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs We introduce MultiBLiMP 1.0, a massively multilingual benchmark of lin-guistic minimal pairs, covering 101 languages, 6 linguistic phenomena and containing more than 125,000 minimal pairs. Our minimal pairs are created using a fully automated pipeline, leveraging the large-scale linguistic re-sources of Universal Dependencies and UniMorph. MultiBLiMP evaluates abilities of LLMs at an unprecedented multilingual scale, and highlights the shortcomings of the current state-of-the-art in modelling low-resource languages. 23 Jessica K. Ivani, Kira Tulchynska Universal Dependencies for Suansu This contribution presents the Naga-Suansu Universal Dependencies (UD) treebank, the first resource of this kind for Suansu, an endangered and un-derdocumented Tibeto-Burman language spoken in Northeast India. This treebank follows the UD annotation framework. We describe the corpus composition, data sources, and annotation process, outlining the general structure of the treebank. In addition, we highlight morphosyntactic chal-lenges where Suansu grammar does not fit neatly into the UD annotation schema and propose adaptations to better capture its structural proper- ties. As the first Tibeto-Burman language included in the UD project, the Naga-Su-ansu treebank serves several purposes: it contributes to the documentation and preservation of endangered languages, enables the understanding of cross-linguistic variation, and supports future research efforts in refining UD annotation practices for South and Southeast Asian languages. Lauren Levine, Junghyun Min, Amir Zeldes Building UD Cairo for Old English in the Classroom In this paper we present a sample treebank for Old English based on the UD Cairo sentences, collected and annotated as part of a classroom curric-ulum in Historical Linguistics. To collect the data, a sample of 20 sentences illustrating a range of syntactic constructions in the world’s languages, we employ a combination of LLM prompting and searches in authentic Old English data. For annotation we assigned sentences to multiple students with limited prior exposure to UD, whose annotations we compare and ad-judicate. Our results suggest that while current LLM outputs in Old English do not reflect authentic syntax, this can be mitigated by post-editing, and that although beginner annotators do not possess enough background to complete the task perfectly, taken together they can produce good results and learn from the experience. We also conduct preliminary parsing exper-iments using Modern English training data, and find that although perfor-mance on Old English is poor, parsing on annotated features (lemma, hy-perlemma, gloss) leads to improved performance. 24 Arianna Masciolini, Aleksandrs Berdicevskis, Maria Irena Szawerna, Elena Volodina Annotating Second Language in Universal Dependencies: a Review of Current Practices and Directions for Harmonized Guidelines Universal Dependencies (UD) is gaining popularity as an annotation stand-ard for second language (L2) material. Grammatical errors and other inter-language phenomena, however, pose significant challenges that official guidelines only address in part. In this paper, we give an overview of cur-rent annotation practices and provide some suggestions for harmonizing guidelines for learner corpora. Joakim Nivre, William Croft Reference and Modification in Universal Dependencies Is the framework of Universal Dependencies (UD) compatible with findings from linguistic typology? To address this question, we need to systemati-cally review how UD represents linguistic constructions in the world’s lan-guages, and how it handles the range of morphosyntactic variation attest-ed in linguistic typology. In this paper, we start this review by discussing reference and modification constructions. The review shows that, although UD can represent all major constructions in this area, there are a number of cases where UD categories do not align systematically with a typological classification of constructions, and where constructional similarity is there-fore not transparent across languages. We also identify limitations in the representation of certain morphosyntactic strategies, notably indexation and linkers. To overcome these limitations, we propose a number of revi-sions that may be considered for future versions of UD. 25 Masanori Oya UD Treebanks for Esperanto as a natural language This paper describes the details of UD-based morphological and syntac-tic annotations on Esperanto texts to construct its small-scale UD treebank. Though it was created as an international auxiliary language, Esperanto has increasingly been studied as a natural language both in linguistics and in NLP. This paper introduces the detail of manual annotation of UD morpho-logical and relational tags and describes how the frequencies of these tags differ across the treebanks and discusses the possibility of future research of Esperanto as a natural language. Xiulin Yang, Zhuoxuan Ju, Lanni Bu, Zoey Liu, Nathan Schneider UD-English-CHILDES: A Collected Resource of Gold and Silver Universal Dependencies Trees for Child Language Interactions CHILDES is a widely used resource of transcribed child and child-directed speech. This paper introduces UD-English-CHILDES, the first officially re-leased Universal Dependencies (UD) treebank. It is derived from previously dependency-annotated CHILDES data, which we harmonize to follow uni-fied annotation principles. The gold-standard trees encompass utterances sampled from 11 children and their caregivers, totaling over 48K sentences (236K tokens). We validate these gold-standard annotations under the UD v2 framework and provide an additional 1M silver-standard sentences, of-fering a consistent resource for computational and linguistic research. 26 John Bauer, Sakeena Shah, Muhammad Shaheer, Mir Afzal Ahmed Talpur, Zubair Sanjrani, Sarwat Qureshi, Shafi M Pirzada, Christopher D Manning, Mutee U Rahman Universal Dependencies for Sindhi Sindhi is an Indo-Aryan language spoken primarily in Pakistan and India by about 40 million people. Despite this extensive use, it is a low resource lan-guage for NLP tasks, with few datasets or pretrained embeddings available. In this work, we explore linguistic challenges for annotating Sindhi in the UD paradigm, such as language-specific analysis of adpositions and verb forms. We use this analysis to present a newly annotated dependency tree-bank for Universal Dependencies, along with pretrained embeddings and an annotation pipeline specifically for Sindhi annotation. Kira Tulchynska, Sylvanus Job, Alena Witzlack-Makarevich Universal Dependencies Treebank for Khoekhoe (KDT) This paper reports on the development of the first dependency treebank for Khoekhoe (KDT). Khoekhoe (Khoe-Kwadi, Namibia) is a low-resource language with few linguistic and computational resources available pub-licly. This treebank consists of 29k words across six texts taken from various registers. It includes a substantial portion of spoken conversational data. These sentences were annotated manually according to the Universal De-pendencies framework. In this paper, apart from presenting the strategies that have been followed to create the treebank, we also discussed some challenging morphological features and syntactic constructions found in the corpus and outlined how we have handled them using the current Uni-versal Dependencies specification. 27 Arofat Akhundjanova, Furkan Akkurt, Bermet Chontaeva, Soudabeh Eslami, Cagri Coltekin Parallel Universal Dependencies Treebanks for Turkic Languages We introduce the first fully aligned and manually annotated parallel Uni-versal Dependencies (UD) treebanks for four Turkic languages: Azerbaijani, Kyrgyz, Turkish, and Uzbek. These resources currently consist of 148 strate-gically selected sentences that illustrate typologically significant morpho-syntactic phenomena across these related yet distinct languages. These parallel treebanks enable systematic comparative studies of Turkic syntax and may be instrumental in cross-lingual NLP applications. All treebanks are available as part of UD v2.16. Andrew Thomas Dyer, Colleen Alena O’Brien Towards better annotation practices for symmetrical voice in Universal Dependencies Austronesian languages exhibit features that are challenging for Universal Dependencies: most notably, the symmetric voice system, whereby agent, patient, and instrumental arguments (among others) can be the pivot of a transitive structure – complicating the usual assumption that subjects of transitive sentences are semantic agents, and objects semantic patients. To showcase our ideas of how to address the representation of such systems in Universal Dependencies, we introduce a small treebank of sentences from texts and elicitation sessions in Gorontalo, an Austronesian language of Su-lawesi (Indonesia), which exhibits a Philippine-type voice system. We dis-cuss the annotation guidelines for this language, and the extensions of the Universal Dependencies guidelines that are needed to accommodate this and other Austronesian languages. 28 Magali Sanches Duran, Elvis A. de Souza, Maria das Graças Volpe Nunes, Adriana Silvina Pagano, Thiago A. S. Pardo Extending the Enhanced Universal Dependencies – addressing subjects in pro-drop languages Enhanced Universal Dependencies (EUD) serve as a crucial link between syntax and semantics. Beyond basic syntactic dependencies, EUD provides valuable refined logical connections for downstream tasks such as semantic role labeling, coreference resolution, information extraction, and question answering. The original EUD framework defines six types of relationships, but this paper introduces an extension designed to address subject propa-gation in pro-drop languages. This “Extended EUD” proposal increases the number of relationships that may be annotated in sentences, improving lin-guistic representation. Additionally, we report our experiments on a corpus of Portuguese (a pro-drop language), which we make publicly available to the research community. Matthew Kirk Andrews, Cagri Coltekin Developing a Universal Dependencies Treebank for Alaskan Gwich’in This paper presents a Universal Dependencies (UD) treebank of Gwich’in, a severely endangered Athabascan language. The treebank, developed using instructional materials and dictionaries, includes 313 annotated sentences. This paper discusses the methodology used to construct the treebank, the linguistic challenges faced, and the implications of annotating a polysyn-thetic, morphologically complex language within the Universal Dependen-cies framework. The treebank was released with UD version 2.15 and avail- able at https://github.com/UniversalDependencies/UD_Gwichin-TueCL/ . 29 Flavio Massimiliano Cecchini Quid verbumst? Applying a definition of word to Latin in Universal Dependencies Words, more specifically “syntactic words”, are at the centre of a depend-ency-based approach like Universal Dependencies. Nonetheless, its guide-lines do not make explicit how such a word should be defined and iden-tified, and so it happens that different treebanks use different standards to this end. To counter this vagueness, the community has been recently discussing a definition put forward in (Haspelmath, 2023) which is not fully uncontroversial. This contribution is a preliminary case study that tries its hand at concretely applying this definition (except for compounds) to Latin in order to gain more insights about its operability and groundedness. This is helped by the spread of Latin over many treebanks, the presence of good linguistic resources to analyse it, and a linguistic type which is probably not fully considered in (Haspelmath, 2023). On the side, this work shows once more the difficulties of turning theoretical definitions into working direc-tives in the realm of linguistic annotation. Qizhen Yang ShUD: the First Shanghainese Universal Dependency Treebank This paper introduces ShUD, the first Universal Dependencies (UD) tree-bank for Shanghainese, a Wu Chinese variant spoken by approximately 14 million people but severely under-resourced in NLP. The treebank is built through a scalable annotation pipeline that exploits grammatical parallels between Shanghainese and Mandarin. Our pipeline also provides a prac-tical strategy for bootstrapping resources for other Chinese dialects. We documented syntactic phenomena unique to Shanghainese within the UD framework and fine-tuned a dependency parser using our annotated treebank, contributing a foundation to both NLP tool development and cross-linguistic syntactic research. 30 DepLing 2025 8th International Conference on Dependency Linguistics Abstracts Keynote Daniel Zeman (Charles University, Prague) Auxiliaries across Languages and Frameworks In my talk, I will discuss the status of auxiliaries (i.e., auxiliary verbs as well as uninflected non-verbal particles with auxiliary function) in dependency treebanks. The focus will be on two frameworks, Universal Dependencies (UD) and the Prague family of treebanks, rooted in the Functional Gener-ative Description. However, I will occasionally show examples from other treebanks and frameworks, encountered during the HamleDT harmoniza-tion effort. Besides looking at various treatments of auxiliaries in different annotation schemes, I will also discuss the question of delimiting the set of auxiliaries in individual languages (or, more exactly, the set of words that receive the special treatment in the respective annotation schemes). Various gram-matical tests may be available, but sometimes the auxiliaries are simply enumerated by traditional school grammar. Moreover, there is a scale of categories ranging from pure grammatical auxiliaries through modals and phase verbs to various semantically bleached verbs that take other verbs as complements, yet their contribution is lexical rather than grammatical and their syntactic behavior shows no anomalies. All these aspects complicate finding a unified definition that would be applicable in a multi-lingual data-set, such as HamleDT or UD. In the last part of the talk, I will show some examples of contrastive cross-lin-guistic studies that would benefit from comparably defined auxiliaries. I will show how we encourage comparability in UD using a common database of auxiliaries, and I will argue that it has the potential to become a useful resource of its own. 32 Ludovica Pannitto, Eleonora Zucchini, Silvia Ballarè, Cristina Bosco, Caterina Mauri, Manuela Sanguinetti Introducing KIParla Forest: seeds for a UD annotation of interactional syntax The present project endeavors to enrich the linguistic resources available for Italian by introducing KIParla Forest, a treebank for the KIParla corpus - an existing and well-known resource for spoken Italian. This article con-textualizes the project, describes the treebank creation process and design choices, and highlights future plans for next improvements. Nives Hüll, Kaja Dobrovoljc Word Order Variation in Spoken and Written Corpora: A Cross-Linguistic Study of SVO and Alternative Orders This study investigates word order variation in spoken and written cor-pora across five Indo-European languages: English, French, Norwegian (Nynorsk), Slovenian, and Spanish. Using Universal Dependencies tree-banks, we analyze the distribution of six canonical word orders (SVO, SOV, VSO, VOS, OSV, OVS). Our results reveal that spoken language consistently exhibits greater word order flexibility than written language. This increased flexibility manifests as a decrease in the dominant SVO pattern and a rise in alternative orders, though the extent of this variation differs across lan-guages. Morphologically rich languages such as Slovenian and Spanish show the most pronounced shifts, while English remains syntactically rigid across modalities. These findings support the claim that modality signifi-cantly affects syntactic realizations and highlight the need for typological studies to account for spoken data. 33 Roulon-Doko Paulette, Sylvain Kahane, Bruno Guillaume A morpheme-based treebank for Gbaya, an Ubanguian language of Central Africa In this paper, we present the first treebank for Gbaya, a language from the under-resourced Niger-Congo family. The language has a rich system of tonal morphemes and virtually no affixes. The dependency analysis is based on a morpheme-based tokenisation and the treebank is also distributed in word-based Universal Dependencies version. Several constructions are dis-cussed in the paper: genitive construction, clause coordination, sentence particles, adverbial and relative clauses, serial verb constructions, reported speech, topicalization, and focalization. Michael Gasser, Nazareth Amlesom Kifle UD Annotation of Experience Clauses in Tigrinya We are developing a treebank for Tigrinya within the Universal Dependen-cy (UD) framework. UD proposes a set of universal grammatical relations to capture dependency relations between words in any language. However, for some classes of verbs it is not a straightforward matter to know what grammatical relations the verbs are categorized for. In this paper we dis-cuss the decisions we have had to make for the annotation of arguments of experience verbs in the Semitic language Tigrinya, which exhibit a num-ber of unusual morphosyntactic properties. We describe a classification of experience verb roots in the language, based on the various ways in which the core experiencer and stimulus arguments are realized syntactically and morphologically and on which valence-changing operations the roots per-mit. We supplement our analysis with data from a morphological analysis of a Tigrinya corpus. 34 Qishen WU, Santiago Herrera, Pierre Magistry, Sylvain Kahane A corpus-driven description of OV order in Archaic Chinese This paper presents a quantitative study of Object-Verb (OV) order in Ar-chaic Chinese based on a Universal Dependencies (UD) treebanks. Treating word order as a binary choice (OV vs VO), we train a sparse logistic-regres-sion classifier that selects the most salient syntactic features needed for an accurate prediction to investigate the specific syntactic contexts allowing OV word order and to identify to what extent do these factors favour this order. The ranked features are understood as interpretable rules, and their coverage and precision as quantitative properties of each rule. The ap-proach confirms earlier qualitative findings (e.g. pronoun object fronting and negation favour OV) and uncovers new contrasts in word order be-tween different reflexive pronouns. It also identifies annotation errors that we corrected in the final analysis, illustrating how the quantitative models, combined with fine-grained corpus analysis, can improve treebank quali-ty. Our study demonstrates that lightweight machine-learning techniques applied to an existing syntactic resource can reveal fine-grained patterns in historical word order and this can be reapplied to other languages. Lenka Krippnerová, Daniel Zeman Periphrastic Verb Forms in Universal Dependencies We propose a generalization of the morphological annotation in Univer-sal Dependencies (UD) to phrases spanning multiple words, possibly dis-continuous. Our focus area is that of periphrastic tenses, voices and other forms, typically consisting of a non-finite content verb combined with one or more auxiliaries; however, the same approach can be applied to other morphosyntactic constructions. We present a software tool that can detect periphrastic verb forms, extract the relevant morphological features from member words and combine them into new, phrase-level annotation. The tool currently detects periphrastic verb forms in 15 Slavic languages that are represented in UD and it is easily adaptable to other constructions and languages. Both the tool and the processed Slavic data are freely available. 35 Marie-Pauline Krielke, Diego Alves, Luigi Talamo Tracing Syntactic Complexity: Exploring the Evolution of Average Dependency Length Across Three Centuries of Scientific English We present a diachronic analysis of syntactic change in a corpus covering over 300 years (1665–1996) of scientific English, annotated with Universal Dependencies (UD) and Dependency Length (DL). We trace the develop-ment of average Dependency Length (aDL) as a measure of syntactic com-plexity in scientific English between 1665 and 1996. We describe the con-struction of the corpus and report on the evaluation of the UD annotation. We find that aDL initially decreases toward the 19th century, but then in-creases significantly in the 20th century. We show that this highly aggregate measure of aDL masks the underlying mechanisms driving changes in syn-tactic complexity. A more fine-grained analysis of the dependency relations involved in these changes reveals that the increasing use of (multi-word) compounds is a dominant source of long, leftward-expanded noun phras-es. This leads to an expansion of syntactic dependencies both within and beyond the noun phrase. The results offer a new perspective on syntactic complexity, shifting the focus from the sentence level to the phrasal level. 36 Loic De Langhe, Jasper Degraeuwe, Melissa Farasyn, Veronique Hoste Modeling Syntactic Dependencies in Southern Dutch Dialects Dependency parsing of non-normative language varieties remains a chal-lenge for modern NLP. While contemporary parsers excel at standardized languages, dialectal variation -- especially in function words, conjunctives, and verb clustering -- introduces syntactic ambiguity that disrupts tradition-al parsing approaches. In this paper, we conduct a quantitative evaluation of syntactic dependencies in Southern Dutch dialects, leveraging a standardized dialect corpus to isolate syntactic effects from lexical variation. Using a neural biaffine dependency parser with various mono- and multilingual transform-er-based encoders, we benchmark parsing performance on standard Dutch, dialectal data, and mixed training sets. Our results demonstrate that incorpo-rating dialect-specific data significantly enhances parsing accuracy, yet certain syntactic structures remain difficult to resolve, even with dedicated adapta-tion. These findings highlight the need for more nuanced parsing strategies and improved syntactic modeling for non-normative language varieties. Alba Táboas García, Leo Wanner Assessing the Agreement Competence of Large Language Models While the competence of LLMs to cope with agreement constraints has been widely tested in English, only a very limited number of works deals with morphologically rich(er) languages. In this work, we experiment with 25 mono- and multilingual LLMs, applying them to a collection of more than 5,000 test examples that cover the main agreement phenomena in three Romance languages (Italian, Portuguese, and Spanish) and one Slavic Language (Russian). We identify which of the agreement phenomena are most difficult for which models and challenge some common assumptions of what makes a good model. The test suites into which the test exam-ples are organized are openly available and can be easily adapted to other agreement phenomena and other languages for further research. 37 Xinying Chen, Miroslav Kubát Genre Variation in Dependency Types: A Two-Level Genre Analysis Using the Czech National Corpus This paper examines how dependency type distributions vary across gen-res in the Czech National Corpus (SYN2020). Using a two-level genre clas-sification, broad categories and fine-grained subgenres, we identify gen-re-sensitive syntactic patterns through relative frequency analysis. The results show that some dependency types (e.g. Atr ‘attribute’) vary consist-ently across genres, while others (e.g. ExD ‘part of discourse ellipsis’) show sensitivity only at the subgenre level. Our dependency-based approach extends common multidimensional analyses based on lexical-grammatical co-occurrences, directly capturing syntactic evidence and improving inter-pretability. Our findings also highlight the importance of fine-grained gen-re distinctions in revealing syntactic variation. Kirill Chuprinko, Artem Novozhilov, Arthur Stepanov Distance and Projectivity as Predictors of Sentence Acceptability in Free Word Order Languages This study investigates how two core metrics rooted in Dependency Gram-mar, Minimal Dependency Distance (MDD) and projectivity, predict sen-tence acceptability in Russian and Serbo-Croatian. Using exhaustive word order permutations in controlled five-word sentences, we model how these metrics relate to acceptability judgments in two psycholinguistic ex-periments. While MDD has been widely studied as a processing constraint, projectivity violations have received less attention in acceptability mod-eling. We show that both significantly affect judgments, with projectivity playing a surprisingly strong role. In addition, Serbo-Croatian’s rigid clitic placement provides a natural test case for disentangling grammatical from processing constraints. Our findings offer a computationally precise, de-pendency-based model of acceptability that advances cognitively ground-ed language modeling for free word order languages. 38 Timothy John Osborne, Chenchen Song Head-initial and head-Final coordinate structures in two annotation schemes of dependency grammar The Universal Dependencies (UD) and Surface-Syntactic Universal Depend-encies (SUD) annotation schemes view coordinate structures as head-ini-tial. This contribution argues that a more flexible approach to coordinate structures is linguistically motivated, one that sees coordinate structures as head-initial in greater head-initial structures and as head-final in greater head-final structures. Support for this flexible approach comes from two ar-eas: dependency distance and a nearness effect. In addition, two arguments that have been produced supporting the strictly head-initial approach are examined and refuted. Roberto A. Diaz Hernandez A Typology of Non-Projective Patterns in Unas’s and Teti’s Pyramid Texts Abstract: The aim of this paper is to study the use of non-projective struc-tures in Unas’s and Teti’s Pyramid Texts (ca. 2321–2279 BC) annotated in the Egyptian-UJaen treebank. It offers the first typology of non-projective pat-terns in Old Egyptian, and it discusses the causes for non-projectivity in the Old Egyptian language of Unas’s and Teti’s Pyramid Texts to conclude that non-projectivity is an exceptional phenomenon in these texts. 39 Zexin Liu Dependency Analysis of Chinese Comparative Sentences This paper examines the dependency structures of comparative sentences across various Chinese dialects. In equality comparatives, the comparative result is post-posed (R-back) in all Chinese dialects, which contrasts with English. Although Mandarin also follows the R-back pattern for superiori-ty comparatives, dialects such as Hong Kong Cantonese and Southern Min adopt an R-front type, similar to English. However, Southern Min lacks a comparative marker, while English’s comparative marker than dominates the standard of comparison. In contrast, the comparative marker in Canton-ese does not dominate the standard. Through the calculation of depend-ency distances and syntactic tests, we argue that when the comparative result is preposed, it dominates the standard of comparison. Conversely, when the comparative construction fol-lows an R-back type, the compar-ative marker dominates the comparative result. This analysis aligns closely with the annotation choices of the Surface-Syntactic Universal Dependen-cies (SUD), differing significantly from those of the Universal Dependen-cies (UD) model. Matea Andrea Birtić, Siniša Runjaić, Robert Sviben Dative alternations in less-researched syntactic patterns of standard Croatian Dative alternation in double object constructions is a frequently researched syntactic phenomenon, having been investigated across world languages. Consequently, even relatively smaller and under-resourced languages like Croatian have seen influential studies on the topic. Recent syntactic and se-mantic analyses of verbs in standard Croatian have identified less-explored instances of dative alternation. This contribution aims to describe the alter-nation between dative case and prepositional phrase for the non-agentive and intransitive uses of the verb služiti (‘to serve’), as well as the dative al-ternation for the agentive and transitive uses of the verb izbjeći (‘to avoid’). 40 Bingli Liu, Yiyi Zhao A Quantitative Study of Subject-Predicate-Object Word Class Composition in vernacular Chinese Based on Dependency Grammar The paper aims at studying the evolution of lexical composition of sub-ject-verb-object sentences in vernacular Chinese. Five corpora are con-structed for the Tang and Five Dynasties, Song Dynasty, Yuan and Ming Dynasties, Qing Dynasty, and the present contemporary era which lasts for more than 1,000 years. The syntactic structures of these sentences are labe-led, counted, and analyzed based on the theoretical foundation of depend-ency grammar, with the aim of investigating the evolution of the lexical cat-egory composition of the subject-predicate-object in vernacular Chinese over time. The results show that the ratio of nouns and pronouns in each period occupies the majority of the total number of subject lexemes, and the lexical composition of predicates has been very stable since ancient times, with verbal predicates accounting for the vast majority of predicates. Compared with the subject lexical composition, objects are richer and the lexical composition changes more slowly. 41 TLT 2025 23rd Workshop on Treebanks and Linguistic Theories Abstracts Keynote Amir Zeldes (Georgetown University) Subject prominence revisited: What makes entities salient? In this talk, I’ll explore what makes certain entities stand out in discourse — what we might call more or less “salient” — and how speakers systemat-ically identify them. Building on existing approaches to information struc-tural “aboutness”, subjecthood, Centering Theory and animacy hierarchies, I argue that salience goes beyond surface categories such as definiteness, pronominalization and grammatical function. It’s also shaped by deeper structures: distributional cues, discourse relations, hierarchical organization, genre conventions, and the communicative goals we infer from context. To get at this, I use a graded notion of salience based on how often entities are included in multiple human-written summaries of a text or conversation. Drawing on manually treebanked data from 24 different spoken and writ-ten genres in English, I ask: how is salience expressed for each entity men-tioned in a discourse? I’ll show that while traditional linguistic markers of salience all correlate with our salience scores to some extent, every rule has exceptions, and no single feature tells the whole story. Instead, salience cuts across all levels of linguistic structure, and the most informative theoretical model of the phenomenon must therefore combine cues from across mor-phosyntax, discourse structure, and functional pragmatics. 43 Luka Krsnik, Kaja Dobrovoljc STARK: A Toolkit for Dependency (Sub)Tree Extraction and Analysis We present STARK, a lightweight and flexible Python toolkit for extract-ing and analyzing syntactic (sub)trees from dependency-parsed corpora. By systematically slicing each sentence into interpretable syntactic units based on configurable parameters, STARK enables bottom-up, data-driv-en exploration of syntactic patterns at multiple levels of abstraction—from fully lexicalized constructions to general structural templates. It supports any CoNLL-U-formatted corpus and is available as a command-line tool, Python library, and interactive online demo, ensuring seamless integration into both exploratory and large-scale corpus workflows. We illustrate its functionality through case studies in noun phrase analysis, multiword ex-pression identification, and syntactic variation across corpora, demonstrat-ing its utility for a wide range of corpus-driven syntactic investigations. Sylvain Kahane, Bruno Guillaume, Léna Brun, Simeng Song Status of morphosyntactic features Illustration with written and spoken French UD treebanks Morphosyntactic features used in UD treebanks have different status. If most of them correspond to values of inflectional morphemes, some describe lexical subclasses or are just conventional names of polysemic morphemes. Syncretism is also a challenge, because exact values are only deductible from contextual information. We propose an attempt at clarification and an implementation in the treebanks of written and spoken French. 44 Barbara Hoff, Nathanaël Beiner, Delphine Bernhard Universal Dependencies for the Alemannic Alsatian Dialects We present the first corpus of Alsatian Alemannic dialects following Uni-versal Dependencies (UD) guidelines, a project which already covers many of the world’s languages. Standard languages are represented to a great-er extent than non-standard varieties in UD, and our corpus contributes to closing the gap in the lack of resources for Alsatian dialects by presenting the first UD treebank for these dialects, which are spoken in Northeast-ern France. Our corpus is annotated both with part-of-speech tags and dependency information, as well as French glosses and German lemmas, containing in total 975 sentences and 19,286 tokens, spanning over various text genres. In this article, we present our data, details of the annotation process, as well as some specific syntactic phenomena which differentiate and situate Alsatian with regards to both Standard German and some oth-er German non-standard varieties. The addition of this corpus to the UD project allows for a higher visibility of the Alemannic Alsatian dialects in linguistic research, and provides a valuable resource for research in many fields, including NLP, syntax and comparative Germanic linguistics. Daniel G. Swanson Expanding the Universal Dependencies Ancient Hebrew Treebank with Constituency Data This paper presents an effort to expand the annotation pipeline for the An-cient Hebrew Universal Dependencies treebank to make use of additional data, resulting in the addition of over 4000 sentences and roughly 100K words to the released treebank. The resulting treebank contains 5500 sen-tences and 145K words and the incorporation of converted constituency data has resulted in an annotation process which requires manual interven-tion in only around 15-20\% of sentences, even in previously unseen genres. 45 Niklas Deworetzki, Peter Ljunglöf Graph Databases for Fast Queries in UD Treebanks We investigate if labeled property graphs, and graph databases, can be an useful and efficient way of encoding UD treebanks, to facilitate searching for complex syntactic phenomena. We give two alternative encodings of UD treebanks into the off-the-shelf graph database Neo4j, and show how to translate syntactic queries into the graph query language Cypher. Our evaluation shows that graph databases can improve query times by several orders of magnitude, compared to existing approaches. Raoul Blin, Jinnam Choi Segmentation of Sino-origin words to enhance the representation of Korean and Japanese in S/UD-format treebanks In the Japanese and Korean S/UD treebanks, Chinese-origin words com-posed of two morphophonological units are not segmented, even when they are semantically transparent. We propose segmenting and annotating these words with dependency relations in order to achieve a more fine-grained and unified description of both languages. As an example, we ap-ply this analysis to the pre-annotated GSD corpora in SUD format, and we examine the benefits and limitations of a rule-based approach. 46 Rachel Tal, Shlomit Fuchs, Orly Albeck, Elisheva Brauner, Yitzchak Lindenbaum, Ephraim Meiri, Avi Shmidman A New Hebrew Universal Dependency Treebank: The First Treebank of Post-Rabbinic Historical Hebrew The corpus of post-Rabbinic historical Hebrew is a foundational corpus of Jewish heritage, containing over a billion words of legal, hermeneutical, and philosophic texts (and more). However, because the linguistic norms of the corpus diverge so often from that of modern Hebrew, the corpus cannot be computationally analyzed with existing Hebrew parsers. In order to fill this lacuna, we present the first Universal Dependencies corpus of post-Rabbinic historical Hebrew. The corpus comprises over 11,800 words, and we are pleased to release it to the community. Naeem Uddin, Daniel Zeman Universal Dependency Treebank for a low-resource Dardic Language: Torwali This paper presents and discusses the linguistic phenomena encountered in the development of the ongoing first ever universal dependency treebank for Torwali the Language. Torwali belongs to the Kohistani sub-group of Dardic Indo-Aryan languages, and is considered an endangered (Moseley, 2010) and indigenous language, which makes it extremely low-resourced in terms of linguistic and computational resources. With the aim of including Torwali in Universal Dependencies (UD) (de Marneffe et al. 2021), we are annotating a diverse set of example sentences for POS tags, features and dependency relations. 47 Izabela Czerniak, Debopam Das Syntax of referents of relative markers: Evidence from a corpus of learner English We investigate the referents of relative markers of English relative clauses, focusing on their syntactic role in the matrix clauses. The referents, unlike relative markers and related features, have compratively remained under-studied. We examine the syntactic environments of the referents as part of a larger project, which develops the ICLE-RC, a corpus of learner Eng-lish texts annotated for relative clauses and related phenomena (it-/pseu-do-clefts, existential-relatives, etc.). The corpus derives from the Interna-tional Corpus of Learner English (ICLE; Granger et al., 2020), and contains 144 academic essays, representing six L1 backgrounds – Finnish, Italian, Polish, Swedish, Turkish, and Urdu. We annotate those texts for over 900 relative clauses (and over 400 related phenomena), with respect to a wide array of lexical, syntactic, semantic, and discourse features. Results from our analysis show that the relativisation of referents varies according to their syntactic functions. The referents are also observed to interact with other RC-features, yielding systematic variations across different L1 backgrounds, (some of) which can potentially be attributed to the typological properties of the associated L1. 48 Luka Terčon, Kaja Dobrovoljc ComparaTree: A Multi-Level Comparative Treebank Analysis Tool ComparaTree is a tool for comparative treebank analysis that combines various methods of quantitative linguistic analysis to provide a general overview of the differences and similarities between two treebanks. The comparison tool covers a range of subfields of linguistic analysis, provid-ing a summary of the differences and similarities in terms of the lexical di-versity, n-gram diversity, part-of-speech and dependency relation propor-tions, syntactic complexity, and syntactic diversity. We explain the various quantitative analyses performed on every level along with the generation of graphical visualizations, which add value by enabling user-friendly com-parisons at a glance. We exemplify the comparison process by presenting the results produced by the tool when comparing two treebanks from the Universal Dependencies collection. Stefanie Dipper Metaphorical Heads and Literal Dependents: Syntactic Properties of Metaphors in German In this paper we examine the way metaphors are expressed in language. Our starting hypothesis is that the two expressions that are central to meta-phor – namely the metaphorical expression and the expression that repre-sents the target of the metaphorical transfer – typically stand in a syntactic dependency relation: metaphorical heads govern literal dependents. An analysis of German sermons with 30k words confirms that the hypothesis applies in 67% of the cases. 10% show the reverse relationship and in 23% there is a common ancestor. 49 Asa Tomita, Hitomi Yanaka, Daisuke Bekki Automatic Evaluation of Linguistic Validity in Japanese CCG Treebanks In natural language inference, the accuracy of systems based on composition-al semantics depends on the quality of syntactic analysis, which in turn relies on linguistically valid training and evaluation data, typically provided by tree-banks. However, conventional treebank evaluation metrics focus on data cov-erage and fail to assess the linguistic validity of syntactic structures. This paper proposes novel evaluation methods to enable automatic and multifaceted as-sessment of linguistic validity. We apply these methods to a Japanese treebank based on combinatory categorial grammar and report the evaluation results. Jingyi Li, Jin Zhao, Nianwen Xue, Shili Ge Annotation of Chinese Light Verb Constructions within UMR This paper discusses the challenges of annotating predicate-argument struc-tures in Chinese light verb constructions (LVCs) within the Uniform Meaning Representation (UMR) framework, a cross-linguistic extension of Abstract Meaning Representation (AMR). A central challenge lies in reliably identi-fying LVCs in Chinese and determining their appropriate representation in UMR. We analyze the linguistic properties of Chinese LVCs, outline annota-tion difficulties for these structures and related constructions, and illustrate these issues through concrete examples. Our analysis focuses specifically on LVC.full types, where the light verb serves solely to convey morphological features and aspectual information. We exclude LVC.cause types, in which the light verb introduces an additional argument (e.g., a causal agent or source) to the event or state denoted by the nominal predicate. To address the practical challenge of semantic role assignment in Chinese LVCs, we pro-pose a dual-path annotation approach: due to the compositional nature of these constructions, we recommend independently annotating the argu-ment structure of the nominal predicate while systematically encoding the grammatical attributes and relations introduced by the light verb. 50 Brandon Waldon, Micaela Wells, Devika Tiwari, Meru Gopalan, Nathan Schneider Legal-CGEL: Analyzing Legal Text in the CGELBank Framework We introduce Legal-CGEL, an ongoing treebanking project focused on syntactic analysis of legal English text in the CGELBank framework (Reyn-olds et al., 2022), with an initial focus on US statutory law. When it comes to treebanking for legal English, we argue that there are unique advantages to employing CGELBank, a formalism that extends a comprehensive—and authoritative—formal description of English syntax (the _Cambridge Gram-mar of the English Language_; Huddleston & Pullum, 2002). We discuss some analytical challenges that have arisen in extending CGELBank to the legal domain. We conclude with a summary of immediate and longer-term project goals. Claudia Corbetta, Federica Iurescia, Marco Carlo Passarotti «Are you Afraid of Ghosts?» A Proposal for Busting Predicate Ellipsis in Universal Dependencies This paper addresses the representation of ellipsis in dependency syntax, proposing both a theoretical and a practical workflow for its analysis and annotation in treebanks, following the state-of-the-art Universal Depend-encies framework. We discuss the challenges of annotating ellipsis, with a focus on predicate ellipsis and its representation in dependency treebanks, and emphasize the importance of accounting for such phenomena for syn-tactic analysis and machine learning applications. We present a case study based on the Italian-Old treebank, demonstrating the applicability of the proposed workflows and invite the community to participate in this initia-tive with their own languages. 51 Deborah Watty, Yung-Jui Yao, Jens N. Watty Case Syncretism in Kasavakan Puyuma: A Field Data Analysis of Noun Phrase Markers Previous research has reported differing patterns of case syncretism across three dialects of Puyuma, an Austronesian language of Taiwan (Nanwang, Katipul, Ulivelivek). This study presents a quantitative analysis of case syncretism of noun phrase markers and disambiguation strategies in the Kasavakan dialect. Our dataset comprises 377 sentences elicited from five speakers, which we annotated for voice, potential semantic ambiguity, word order, and case marking of different semantic roles. We find evidence for a high degree of syncretism between genitive and nominative markers, alongside a decline in the use of genitive forms, particularly for common definite nouns. Some overlap with oblique markers is also attested, sug-gesting varying degrees of case syncretism between speakers. Topicaliza-tion appears to be the most frequent disambiguation strategy, while the or-der of non-topicalized noun phrases does not seem to aid disambiguation. Other factors, including age and individual experiences may contribute to inter-participant variation. These findings contribute to a more complete understanding of case marking in Puyuma by adding new empirical data from the Kasavakan dialect, where patterns of syncretism and disambigua-tion differ from previously described varieties. 52 Linda Wiechetek, Flammie A Pirinen, Maja Lisa Kappfjell How to Create Treebanks without Human Annotators -- An Indigenous Language Grammar Checker for Treebank Construction Creating treebanks for low resource languages is an important task. How-ever, low resource Indigenous language contexts have not only limited resources in terms of text data, but also limited human resources that are available for linguistic annotation. We suggest a work-around by applying a Constraint Grammar operated rule-based dependency parser to do the work of creating a marked-up treebank. However, due to a lot of noise, meaning spelling and grammatical errors in South Sámi written texts, this tool often fails to create complete and correct trees. As a fix to this, we created a grammar checking tool for the most common South Sámi gram-matical error types, which improves the quality of the dependency parser significantly. As both literacy and normative standards for most Indigenous languages are much more recent than for majority languages, spelling and grammatical variation and errors are a common source of noise, and the application of a correction tool like ours can be useful in the construction of treebanks for these languages. Maria Paz Botero-Garcia, Emmett Strickland, Bruno Guillaume, Sylvain Kahane, Anne Lacheret-Dujour An intonosyntactic treebank for spoken French: What is new with Rhapsodie? This paper presents a new format of the Rhapsodie Treebank, which con-tains both syntactic and prosodic annotations, offering a comprehensive dataset for the study of spoken French.This integrated format allow us for complex multilevel queries and open the way for the extraction of intono-syntactic studies. 53 QUASY 2025 3rd Workshop on Quantitative Syntax Abstracts Keynote Xiaofei Lu (The Pennsylvania State University) The rhetorical and pragmatic functions of syntactically complex structures in academic and second language writing Previous studies of linguistic complexity in academic and second language (L2) writing has often focused on quantitative differences across different writer groups and/or longitudinal changes over time, without systematic attention to the rhetorical or pragmatic functions that complex forms are used to convey. This talk argues for the importance of and delineates the scope of the function dimension of linguistic complexity analysis in L2 writ-ing research, reviews the methods and findings of emerging efforts on this dimension, and discusses how future L2 writing research could attend to this dimension. 55 Michaela Nogolová, Michaela Koščová, Jan Macutek, Radek Cech Syntactic units and their length distributions: A case study in Czech This study investigates the length distributions of syntactic units in Czech across multiple hierarchical levels: sentences, independent clauses, claus-es, phrases, subphrases, and chunks. Using a diverse dataset – including Universal Dependency treebanks, presidential speeches, the Czech Bible, and random sample from corpora of modern Czech – the analysis exam-ines whether lengths of these syntactic units follow consistent distribution-al patterns. Length is defined as the number of immediate subunits, and the distributions were modeled using the hyper-Poisson distribution. The results demonstrate that the hyper-Poisson model fits well distributions of length of all abovementioned syntactic units, pointing to a common princi-ple underlying the organization of syntactic structure in Czech. 56 Jianwei Yan, Heng Chen Modeling the Law of Abbreviation in Classical, Modern, and ChatGPT-Generated Chinese: A Power-Law Analysis of Structural Economy This study investigates the Law of Abbreviation—the inverse relation-ship between word length and frequency—across Classical, Modern, and ChatGPT-generated Chinese. Using a tri-partite parallel corpus and a pow-er-law model y = a*x^(-b), we analyze the relationship between word length and the average usage frequency of words within a given word length cate-gory to assess structural economy. Results confirm consistent Zipfian distri-bution across all text types, with high R2 values indicating strong model fit. However, the parameter b varies significantly: Classical Chinese shows the steepest decline, suggesting strong pressure for brevity; Modern Chinese exhibits a moderated pattern; ChatGPT-generated texts display the weak-est pressure, prioritizing fluency over compression. These differences re-flect evolving communicative priorities and reveal that while AI models can mimic statistical distributions, they underrepresent deeper structural pres-sures found in natural language evolution. This study offers new insights into lexical optimization and the parameter b offers a useful metric for com-paring structural efficiency across modalities. Implications are discussed in relation to language modeling, cognitive economy, and the evolution of linguistic structure. 57 Taiga Ishii, Yusuke Miyao On the Flatness, Non-linearity, and Branching Direction of Natural Language and Random Constituency Trees: Analyzing Structural Variation within and across Languages Natural languages exhibit remarkable diversity in their syntactic structures. Previous research has investigated the cross-lingual differences in local structural features such as word order or dependency relations. However, considering structural variation within individual language, it remains un-clear how such features influence the variation in the overall constituency tree structure and hence the structural variation across languages. To this end, we focus on the shape of constituency trees, analyzing the cross-lin-gual overlap in the distributions of flatness, non-linearity, and branching direction. While acknowledging that the findings may be influenced by the potential annotation idiosyncrasies across treebanks, the experiments quantitatively suggest that flatness and branching direction vary signif-icantly across languages. As for non-linearity, the cross-lingual difference was relatively small, and the distributions tend to skew towards linear struc-tures. Furthermore, comparison with randomly generated trees suggests that while phrase category and frequency information is crucial for repro-ducing the branching direction found in natural languages, non-linearity can be replicated reasonably well even without such information. 58 Santiago Herrera, Ioana-Madalina Silai, Bruno Guillaume, Sylvain Kahane Extraction of Contrastive Rules from Syntactic Treebanks: A Case Study in Romance Languages In this paper, we develop a data-driven contrastive framework to extract common and distinctive linguistic descriptions from syntactic treebanks. The extracted contrastive rules are defined by a statistically significant dif-ference in precision and classified as common and distinctive rules across the set of treebanks. We illustrate our method by working on object word order using Universal Dependencies (UD) treebanks in 6 Romance lan-guages: Brazilian Portuguese, Catalan, French, Italian, Romanian and Span-ish. We discuss the limitations faced due to inconsistent annotation and the feasibility of conducting contrasting studies using the UD collection. Yaqin Wang A Quantitative Study of Syntactic Complexity across Genres: Dependency Distance in English and Chinese This study investigates syntactic complexity in fiction and news genres by analyzing mean dependency distances (MDD) across controlled sentence lengths in English and Chinese corpora. Results show that English fiction exhibits greater MDD than news, while Chinese fiction shows the reverse. More complex syntactic structures, i.e., complex coordination structures, are found in English fiction texts than in news writing. In contrast, Chinese news writing relies more on nominal modification and prepositional phras-es that create long-distance dependencies than fiction texts. These find-ings show deviations from uniform correlations between genre formality and syntactic complexity across languages. 59 Žaneta Stiborská, Michaela Nogolová, Xinying Chen, Miroslav Kubát Syntactic Complexity in L2 Reading: A Comparison of Adapted and Original Czech Texts This corpus-based study explores the syntactic complexity of adapted Czech texts designed for learners of Czech as a second language (L2). It investigates how syntactic complexity varies according to learner proficien-cy levels (A2, B1, B2) as defined by the Common European Framework of Reference for Languages (CEFR) and how these adapted texts differ from their original versions. Quantitative analyses using metrics such as average sentence length (ASL), average clause length (ACL), mean dependency distance (MDD), and mean hierarchical distance (MHD) demonstrate clear systematic simplifications in adapted texts at lower proficiency levels. At A2 and B1 levels, adapted texts were found to be significantly less syntactically complex compared to their original counterparts. However, these differ-ences diminished notably at the B2 proficiency level, indicating a gradual alignment of adapted texts with native-level syntactic complexity as learner proficiency increased. These results underscore the importance of careful syntactic calibration in creating educational materials for language learners, highlighting implications for curriculum design, instructional methodolo-gies, and materials development. The findings offer valuable insights for language educators and textbook authors aiming to optimize reading ma-terials to support language acquisition effectively. 60 Tina Munda, Špela Arhar Holdt First Insights into the Syntax of Slovene Student Writing: A Statistical Analysis of Šolar 3.0 vs. Učbeniki 1.0 This study investigates the syntactic features of Slovene student writing by comparing essays from the Šolar 3.0 corpus (ages 13–19; primary and second-ary school levels) with textbook texts from the Učbeniki 1.0 corpus aligned to the same educational stages. We apply quantitative syntactic analysis at two complementary levels: clause-type frequency (coordination, parataxis, and four types of subordination) and tree-based syntactic complexity meas-ures (number of clauses, clauses per T-unit, and maximum parse-tree depth). Results show that students heavily rely on coordination and specific subor-dinate clauses (especially object and adverbial), producing more clauses per sentence and per T-unit than textbooks. However, their sentences tend to exhibit flatter syntactic structures, with shallower embedding in primary school and only modest increases in tree depth by secondary school. These findings reveal a divergence between surface-level complexity and hierar-chical depth, highlighting developmental trends and instructional targets in written syntactic maturity. We conclude by discussing implications for syn-tactic development and directions for future research. Marina Cerebrinsky Subject-Verb Agreement Alternations in Spanish Pseudopartitive Constructions: A Corpus Study Pseudopartitive constructions, following the format N1-of-N2 (such as a group of students), are known to feature alternations in their subject-verb agreement patterns, either with the N1 or the N2. Through a corpus analysis, this study investigates the possibility of a correlation between the choice of N1/N2 as an agreement trigger and the semantic type of the N1, as well as the animacy status of the N2. Although a positive correlation was found for N1 semantic type, no statistically significant results emerged for N2 animacy. 61 Jaka Čibej A Computational Method for Analyzing Syntactic Profiles: The Case of the ELEXIS-WSD Parallel Sense-Annotated Corpus In the paper, we present an approach to comparing corpora annotated with dependency relations. The method relies on the compilation of syn-tactic profiles – numeric vectors representing the relative frequencies of different syntactic (sub)trees extracted automatically with the STARK 3.0 open-access dependency tree extraction tool. We perform the extraction on the ELEXIS-WSD Parallel Sense-Annotated Corpus, which has recently been published as version 1.2 with UD dependency relation annotations for 10 European languages. The corpus provides an additional resource for contrastive studies in quantitative syntax. In addition to presenting the cor-pus and conducting some proof-of-concept analyses, we discuss several other potential uses and improvements to the proposed approach. Miroslav Kubát, Xinying Chen, Michaela Nogolová, Michal Místecký Syntactic Complexity and News Credibility in Czech Media This study examines how syntactic complexity varies across news articles differing in credibility, using a Czech-language corpus annotated with five credibility levels: credible, partially credible, misleading, manipulative, and unclassifiable. We apply a dependency parsing pipeline and compute five syntactic metrics measuring features such as sentence length, clause densi-ty, and hierarchical depth. Results show that manipulative texts are structur-ally the most complex, while misleading and unclassifiable texts are simpler and more fragmented. Credible texts display balanced complexity consist-ent with journalistic norms. These findings highlight the role of syntax in shaping rhetorical strategies and contribute to the linguistic understanding of news credibility. 62 Masanori Oya Degree centrality as a measure of robustness of dependency structures of the sentences in a large-scale learner corpus of English This paper examines the differences in the robustness of syntactic de-pendency structures in written English produced by learners of varying proficiency levels and by native English speakers. The robustness of these dependency structures is represented by their degree centralities, and cor-pus-based investigation revealed that learners with higher proficiency lev-els tend to produce sentences with lower degree centralities. This means that they produce more robust, and more embedded sentences. It is also revealed that the sentences produced by native speakers of English tend to produce more embedded sentences than non-native speakers. Serhii D Prykhodchenko, Oksana Yu. Prykhodchenko Application of Existing Readability Methods to the Ukrainian Language: A Comprehensive Study The Ukrainian language currently lacks a well-developed framework for as-sessing text readability. This study addresses this gap by focusing on three key contributions. First, we present the creation of UkrTB, a Ukrainian-lan-guage corpus of texts categorized by reader age. Second, we conduct a statistical analysis of the corpus, evaluating key linguistic features such as sentence length, word complexity, and part-of-speech distribution. Third, we systematically assess the applicability of existing readability formulas, including Flesch, Flesch-Kincaid, Matskovskii, Pisarek, and Solnyshkina et al., to Ukrainian texts. Our findings indicate that readability models devel-oped for English and other Slavic languages exhibit significant limitations when applied to Ukrainian. While some methods demonstrate partial cor-relation with expected readability levels, others produce inconsistent re-sults, underscoring the need for a specialized readability metric tailored to Ukrainian. This work lays the foundation for further research in Ukrainian readability assessment and the development of language-specific models. 63 Isabell Landwehr The Interplay of Noun Phrase Complexity and Modification Type in Scientific Writing We investigate the interplay of noun phrase (NP) complexity and modifica-tion type, namely the choice between pre- and postmodification, using a corpus-based approach. Our dataset is the Royal Society Corpus (RSC, Fis-cher et al. 2020), a diachronic corpus of English scientific writing. We find that the number of dependents, length of the head noun and distance to the head noun’s own syntactic head (typically the main verb) affect the like-lihood of pre- vs. postmodification: NPs with more dependents are more likely to be premodified, NPs with a longer head noun and a head noun closer to its own head are more likely to be postmodified. In addition, we find an effect of syntactic role and definiteness as well as time: The like-lihood of premodification over postmodification increases with time and subject NPs as well as indefinite NPs are more likely to be premodified than NPs in other syntactic roles or definite NPs. 64 Josh Higdon, Valeria Pagliai, Zoey Liu Predictability Effects of Spanish-English Code-Switching: A Directionality and Part of Speech Analysis Research on code-switching (CS), the spontaneous alternation between two or more languages within a discourse, remains relatively new and of-ten limited by the use of elicited production tasks, with some exceptions leveraging naturalistic corpora. This study analyses the effects of language directionality and part-of-speech (POS) tags on Spanish-English CS pro-duction between corpus modalities and speech communities. We use data from two spoken corpora: Miami Bangor Corpus (MBC; N = 261,711) and Spanish in Texas Corpus (STC; N = 416,784), as well as the written LinCE Corpus (N=278,093). Bootstrap analyses indicate that Spanish serves as the matrix language (i.e., the most used) for MBC and LinCE, while English is for STC. Logistic regression analyses show that the particle-coordinating conjunction combination was the strongest POS predictor of a CS. The re-sults suggest that corpus modality and the speech community affect matrix language proportions and that both previously attested and unseen POS combinations modulate the production of Spanish-English CS. 65 Akshit Kumar, Dipti Sharma, Parameswari Krishnamurthy Do Multilingual Transformers Encode Paninian Grammatical Relations? A Layer-wise Probing Study Large multilingual transformers such as XLM-RoBERTa achieve impressive performance on diverse NLP benchmarks, but understanding how they in-ternally encode grammatical information remains challenging. This study investigates the encoding of syntactic and morphological information de-rived from the Paninian grammatical framework—specifically designed for morphologically rich Indian languages—across model layers. Using diag-nostic probing, we analyze the hidden representations of frozen XLM-RoB-ERTa-base, mBERT, and IndicBERT models across seven Indian languages (Hindi, Kannada, Malayalam, Marathi, Telugu, Urdu, Bengali). Probes are trained to predict Paninian dependency relations (by edge probing) and essential morphosyntactic features (UPOS tags, Vibhakti markers). We find that syntactic structure (dependencies) is primarily encoded in the mid-dle-to-upper-middle layers (layers 6–9), while lexical features peak slightly earlier. Although the general layer-wise trends are shared across models, significant variations in absolute probing performance reflect differences in model capacity, pre-training data, and language-specific characteristics. These findings shed light on how theory-specific grammatical information emerges implicitly within multilingual transformer representations trained largely on unstructured raw text. 66 ORGANIZATION Organizing Institutions The Faculty of Arts of the University of Lju-bljana (UL FF) educates students and creates top-quality educators with open and criti-cal thinking in the humanities and social sci-ences, as well as educating teachers in these fields. It pays special attention to strength-ening the disciplines of national importance that co-create Slovenian identity. The Faculty of Computer and Information Science of the University of Ljubljana (UL FRI) is Slovenia’s leading educational and research institution for computer and information sci-ence. The Faculty’s main function is educat-ing undergraduate and graduate computer science experts of various profiles, as well as engaging in research work which generates new knowledge and uncovers solutions to contemporary problems. The Slovenian Language Technologies So-ciety (SDJT) was founded in 1998 and joins people working on language technologies from the scientific, educational or user per-spectives. The activities of the SDJT are aimed at promoting the development of language technologies for the Slovenian language. 67 Local Organizing Committee • Kaja Dobrovoljc, chair • Luka Terčon • Sara Kos • Matej Klemen • Tinca Lukan • Timotej Knez • Špela Arhar Holdt • Marko Robnik-Šikonja Programme Chairs 18th International Conference on Parsing Technologies (IWPT): • Kenji Sagae (University of California, Davis) • Stephan Oepen (University of Oslo) 8th Universal Dependencies Workshop (UDW): • Gosse Bomma (University of Groningen) • Çağrı Çöltekin (University of Tübingen) 8th International Conference on Dependency Linguistics (DepLing): • Eva Hajičová (Charles University, Prague) • Sylvain Kahane (Université Paris Nanterre) 23rd Workshop on Treebanks and Linguistic Theories (TLT): • Heike Zinsmeister (University of Hamburg) • Sarah Jablotschkin (University of Hamburg) • Sandra Kübler (Indiana University) 3rd Workshop on Quantitative Syntax (QUASY): • Xinying Chen (University of Ostrava) • Yaqin Wang (Guangdong University of Foreign Studies) 68 Publication Chair • Sarah Jablotschkin (University of Hamburg) Programme Committee • Mahesh Akavarapu (Eberhard-Karls-Universität Tübingen) • Leonel Figueiredo de Alencar (Federal University of Ceará) • Patricia Amaral (Indiana University) • Giuseppe Attardi (University of Pisa) • John Bauer (Stanford University) • David Beck (University of Alberta) • Laura Becker (Albert-Ludwigs-Universität Freiburg) • Aleksandrs Berdicevskis (Gothenburg University) • Ann Bies (University of Pennsylvania) • Igor Boguslavsky (Universidad Politécnica de Madrid) • Bernd Bohnet (Google) • Cristina Bosco (University of Turin) • Gosse Bouma (University of Groningen) • Miriam Butt (Universität Konstanz) • Giuseppe G. A. Celano (Universität Leipzig) • Heng Chen (Guangdong University of Foreign Studies) • Xinying Chen (University of Ostrava) • Jinho D. Choi (Emory University) • Cagri Coltekin (University of Tuebingen) • Daniel Dakota (Leidos) • Stefania Degaetano-Ortlieb (Universität des Saarlandes) • Kaja Dobrovoljc (University of Ljubljana) • Jakub Dotlacil (Utrecht University) • Gülşen Eryiğit (Istanbul Technical University) 69 • Kilian Evang (Heinrich Heine University Düsseldorf) • Pegah Faghiri (CNRS) • Ramon Ferrer-i-Cancho (Universidad Politécnica de Cataluna) • Marcos Garcia (Universidade de Santiago de Compostela) • Kim Gerdes (Université Paris-Saclay) • Loïc Grobol (Université Paris Nanterre) • Bruno Guillaume (INRIA) • Carlos Gómez-Rodríguez (Universidade da Coruña) • Eva Hajicova (Charles University) • Dag Trygve Truslew Haug (University of Oslo) • Santiago Herrera (University of Paris Nanterre) • Richard Hudson (University College London) • Maarten Janssen (Charles University Prague) • Jingyang Jiang (Zhejiang University) • Mayank Jobanputra (Universität des Saarlandes) • Sylvain Kahane (Université Paris Nanterre) • Václava Kettnerová (Charles University Prague) • Sandra Kübler (Indiana University) • Guy Lapalme (University of Montreal) • François Lareau (Université de Montréal) • Miryam de Lhoneux (KU Leuven) • Zoey Liu (University of Florida) • Teresa Lynn (Dublin City University) • Jan Macutek (Slovak Academy of Sciences) • Robert Malouf (San Diego State University) • Marie-Catherine de Marneffe (UCLouvain) • Nicolas Mazziotta (Université de Liège) • Alexander Mehler (Johann Wolfgang Goethe Universität Frankfurt am Main) • Maitrey Mehta (University of Utah) • Wolfgang Menzel (Universität Hamburg) • Marie Mikulová (Charles University) • Aleksandra Miletić (University of Helsinki) 70 • Jasmina Milićević (Dalhousie University) • Simon Mille (Dublin City University) • Yusuke Miyao (The University of Tokyo) • Noor Abo Mokh (Indiana University) • Simonetta Montemagni (Institute for Computational Linguistics “A. Zampolli” (ILC-CNR)) • Jiří Mírovský (Charles University Prague) • Kaili Müürisep (Institute of computer science, University of Tartu) • Anna Nedoluzhko (Charles University Prague) • Ruochen Niu (Beijing Language and Culture University) • Joakim Nivre (Uppsala University) • Stephan Oepen (University of Oslo) • Timothy John Osborne (Zhejiang University) • Petya Osenova (Sofia University “St. Kliment Ohridski”) • Agnieszka Patejuk (Polish Academy of Sciences) • Lucie Poláková (Charles University Prague) • Prokopis Prokopidis (Athena Research Center) • Mathilde Regnault (Universität Stuttgart) • Kateřina Rysová (University of South Bohemia) • Magdaléna Rysová (Charles University Prague) • Tanja Samardzic (University of Zurich) • Giuseppe Samo (Beijing Language and Culture University) • Haruko Sanada (Rissho University) • Nathan Schneider (Georgetown University) • Djamé Seddah (Sorbonne University) • Anastasia Shimorina (Orange) • Maria Simi (University of Pisa) • Achim Stein (University of Stuttgart) • Daniel G. Swanson (Indiana University) • Luka Terčon (Faculty of Arts, University of Ljubljana) • Giulia Venturi (Institute for Computational Linguistics “A. Zampolli” (ILC-CNR)) • Veronika Vincze (University of Szeged) 71 • Yaqin Wang (Guangdong University of Foreign Studies) • Pan Xiaxing (Huaqiao University) • Chunshan Xu (Anhui Jianzhu University) • Nianwen Xue (Brandeis University) • Jianwei Yan (Zhejiang University) • Zdenek Zabokrtsky (Charles University Prague) • Eva Zehentner (University of Zurich) • Amir Zeldes (Georgetown University) • Daniel Zeman (Charles University Prague) • Šárka Zikánová (Charles University Prague) • Heike Zinsmeister (Universität Hamburg) 72 Support We gratefully acknowledge the support of the following institutions and organizations whose contributions have helped make SyntaxFest 2025 possible: • Centre for Language Resources and Technologies at the University of Ljubljana (CJVT UL) • Slovene Common Language Resources and Technology Infrastructure (CLARIN.SI) • COST Action CA21167 - Universality, diversity and idiosyncrasy in lan- guage technology (UniDive) • The Centre of Excellence in Artificial Intelligence for Digital Humanities (CoE AI4DH) • City of Ljubljana • Ljubljana Tourism • General Representative of Flanders in Austria, Hungary, Czech Repub- lic, Slovakia and Slovenia • Vitasis d.o.o. • Alpineon d.o.o. • Amebis d.o.o. • Ustanova patra Stanislava Škrabca 73 Acknowledgment The organization of SyntaxFest 2025 was partially supported by the Slove-nian Research and Innovation Agency through the research program Lan-guage Resources and Technologies for Slovene (P6-0411) and the projects SPOT: A Treebank-Driven Approach to the Study of Spoken Slovenian (Z6-4617) and LLM4DH: Large Language Models for Digital Humanities (GC-0002). Co-funded by the European Union HORIZON-WIDERA-2023-TAL-ENTS-01-01 grant 101186647 - AI4DH. https://syntaxfest.github.io/syntaxfest25/