127
DOI: 10.4312/linguistica.65.1.127-143Matej Meterc*
ZRC SAZU, Inštitut za slovenski jezik Frana Ramovša
Rok Mrvič**
ZRC SAZU, Inštitut za slovensko narodopisje
THE BEST KNOWN AND FREQUENTLY USED SLOVENE 
PROVERBS ACCORDING TO CHATGPT-4O:  
EXPLORING THE POTENTIAL FOR AN AI-BASED 
PAREMIOLOGICAL MINIMUM
1  SOURCES OF LINGUISTIC DATA ON CONTEMPORARY, ACTIVELY 
USED PAREMIOLOGY IN THE SLOVENE LANGUAGE
Prior to the emergence of modern, particularly the recent decades’ corpus-driven lexi-
cography, Slovene proverbs and related paremiological expressions were—beyond 
their primary domain of oral communication—preserved in collections.1 These prov-
erb collections offered readers heterogeneous lists of expressions, drawn from earlier 
compilations and intermingled with items known to the collectors from active usage or 
recorded during fieldwork.
Among Slovene dictionaries that documented contemporary proverbs at the time 
of their compilation, the Dictionary of the Slovene Standard Language (Slovar slov-
enskega knjižnega jezika, SSKJ) stands out, containing approximately 600 paremi-
ological expressions. Research conducted in the context of identifying a Slovene 
paremiological minimum and optimum revealed (Meterc 2017: 213) that 73.6% of 
the paremiological expressions found in the SSKJ are still known today by more 
than half of the survey respondents. Corpus-based research on Slovene proverbs for 
the establishment of a paremiological optimum (Meterc 2017: 75–107) provided an 
empirical foundation for the integration of paremiology into the Dictionary of the 
Slovene Standard Language, 3rd Edition, published on the Fran.si portal since 2016 
under the title eSSKJ. In this dictionary, phraseology and paremiology are systemati-
cally represented through corpus analysis and surveys (Meterc, Jakop 2016; Meterc 
2019). Similar corpus-based methodologies underlie the Dictionary of Proverbs and 
Similar Paremiological Expressions (Slovar pregovorov in sorodnih paremioloških 
izrazov, SPP; Meterc 2020–), which employs a broader range of corpora (Meterc 
2023: 124–125) and also draws on data from studies of the Slovene paremiological 
minimum and optimum. 
* matej.meterc@zrc-sazu.si
**  rok.mrvic@zrc-sazu.si
1 This contribution was prepared within the framework of research programmes P6-0038 and P6- 0088,
 and project J6-50197, financed by the Slovenian Research and Innovation Agency (ARIS). 
128
Artificial intelligence undoubtedly holds great potential for the analysis of linguistic 
data within lexicography (de Schryver 2023; Jakubiček, Rundell 2023). The focus of 
this paper, however, lies elsewhere—we concentrate on a much more basic level of 
inquiry: the retrieval of representative paremiological expressions in a given language 
based on their frequency and familiarity among speakers, by posing direct questions 
to an AI model. The aim is to assess the extent to which the AI model (ChatGPT-4o) 
serves as a useful and reliable source of paremiological material.
A brief note on the selection of the large language model (LLM) used in our research: 
we chose ChatGPT due to its wide popularity and high capability, which, shortly after 
the start of our study, began to be rivaled by several other emerging models. After test-
ing various later models (such as Grok, DeepSeek, and the Slovene GaMS) as well as 
subsequent versions of ChatGPT, we can add an observation that, at the time of this 
article’s publication, the results of these models appear broadly similar at first glance. 
Nevertheless, the newly emerging models and updated versions of existing ones represent 
significant potential for future research in paremiology. An in-depth comparative analysis 
of results produced by different models, however, lies beyond the scope of this article.
2  AN ANALYSIS OF RESPONSES CONCERNING SLOVENE PROVERBS 
AND THE RESEARCH POTENTIAL OF AI USAGE IN COMBINATION 
WITH LANGUAGE CORPORA AND SURVEYS
ChatGPT-4o is an intriguing source of paremiological material for several reasons. 
Two distinct spheres of its potential use may be highlighted. 
The first pertains to the growing role of AI models as information sources for a 
broad range of users who are not linguists, but who may, for various reasons, be in-
terested in proverbs in a given language—whether their own or a foreign one. In such 
cases, artificial intelligence serves as an alternative to the diverse array of other online 
resources, which include both specialist (paremiographically designed) sources—such 
as digitized and primarily online dictionaries and collections—and a multitude of non-
professional sources (e.g., forum discussions, articles focusing on “typical” or “unique” 
proverbs of a particular language, etc.), which are often marked by a lack of representa-
tiveness and inconsistency in their selection of primary materials.
The second sphere of artificial intelligence usage concerns specialized research in 
linguistics, lexicography, paremiography, and folkloristics. Our aim is to define (1) the 
potential of AI-generated responses to complement data from linguistic corpora and sur-
veys, and (2) the relevance of such responses in the context of one of the key theoreti-
cal and practically applicable paremiological concepts: the notions of the paremiological 
minimum and optimum, which will be described in detail in Section 2.5. 
This paper presents an evaluation of responses generated by artificial intelligence 
through a LLM, using empirical data drawn from linguistic corpora and online surveys 
among Slovene native speakers, available from previous research (Meterc 2017, 2023). 
With regard to the paremiological material obtained via artificial intelligence, the paper 
seeks to address the following questions: 
129
1. How reliable are the responses in terms of the type of expression identified as 
a proverb (typological perspective)? (Section 2.2.)
2. How useful can such querying be in determining the most representative 
expressions (relevance in relation to the core of actively used paremiology in a 
given language)? (Section 2.3)
3. How reliable are the responses in terms of the form of individual proverbs 
(accuracy of the expressions with respect to attested variants and the most 
representative, canonical form)? (Section 2.4)
4. What is the potential for developing an LLM-based or LLM- and corpus-based 
paremiological minimum and optimum? (Section 2.5)
2.1 Procedure of the Inquiry: Questions Posed to ChatGPT-4o
 Following the initial test of paremiological inquiries using OpenAI’s GPT-3.5 model 
in January 2024, four more systematically formulated questions were submitted to 
the ChatGPT-4o model on June 5 and 17, 2024. The questions were composed in 
accordance with the following guideline from empirical paremiology:
Frequent proverbs tend to be familiar, whereas familiar proverbs may, but need 
not occur frequently. In linguistic (Jakobsonian) terms, a given culture’s stock 
of familiar proverbs thus turns out to be some kind of a paradigmatic inventory, 
from which items may be (or may not be) projected onto the syntagmatic axis of 
concrete (more or less frequent) proverb usage. (Grzybek, Chlosta 2008: 104) 
The size of the proverb sets used in each prompt was adjusted, as larger sets (we 
tested sets of up to 300 proverbs per prompt) significantly increased both the number 
of repeated or partially repeated proverbs and the number of hallucinations produced 
by the ChatGPT model. Test prompts were also run on different dates, but this had no 
observable impact on the frequency of repetitions or hallucinations, which ultimately 
led us to reduce the number of proverbs per set. A further reason for downsizing 
was the length limitation of this article, which did not allow for a detailed discussion 
of all findings that emerged from the comparative analysis of results. Thus, four 
questions were prepared, each formulated with a different degree of emphasis on 
empirical verifiability—or, in the case of the criterion of “popularity,” on its inherent 
indeterminacy:
A. Please generate a list of the 20 most common and widespread Slovene proverbs 
(5 June 2024).
B. Please generate a list of the 20 most well-known Slovene proverbs among 
speakers of Slovene (5 June 2024).
C. Please generate a list of the 20 Slovene proverbs that are most common in writ-
ten texts (17 June 2024).
D. Please generate a list of the 20 most popular Slovene proverbs (17 June 2024).
130
In response to our follow-up question as to whether the numbering of examples 
in its answers reflects the degree of their representativeness, ChatGPT-4o stated that 
“the order does not reflect any ranking of importance, popularity, or frequency of use.” 
Nevertheless, as shown in Table 1 below, the order of items in responses to similar 
questions submitted on the same date remains largely consistent, contributing to the 
clarity of presentation but carrying no further significance.
The expressions listed in response to the first question (Column A) are labeled 
with the letter A and numbered from 1 to 20. To maintain conciseness and facilitate 
an overview of overlapping expressions across different responses, the subsequent 
columns (B, C, and D) refer to identical items by citing their designation from Column 
A (e.g., A1 in Column B). New expressions not appearing in Column A are marked 
with the respective column label (e.g., B19 and B20 for expressions unique to Column 
B, or C2 for one absent in both Columns A and B).
Expressions that are either included in the Slovene paremiological minimum 
(accompanied by a percentage indicating familiarity among Slovene speakers) or occur 
frequently in Slovene language corpora compiled in the metaFida 1.0 corpus (approx. 6 
billion tokens) are highlighted in bold. Approximate corpus frequency from metaFida 
is indicated in the table following a dash (e.g., “– 300”). In the frequency analysis 
and expression identification, we employed paremiological search methods previously 
applied in the phraseographic work for the eSSKJ and SPP dictionaries (Meterc 2019). 
Proverbs that are not in active contemporary use are presented in regular font: in some 
cases, these are still commonly used (e.g., Prijatelja spoznaš v nesreči), while others are 
rare or absent from modern Slovene usage and were not included in the paremiological 
minimum. However, at least some evidence for these latter expressions or their variants 
can be found either in contemporary materials or in the Pregovori corpus (Babič et al. 
2023).2 Three items that could not be confirmed in any corpus and are structurally clear 
hallucinations are marked in italics and with an asterisk (*). Italics are also used to 
indicate expressions that are not proverbs but rather realizations of verbal idioms (e.g., 
Ne stavi na eno karto).
Where the AI-generated form differs from the base form listed in the SPP dictionary, 
the dictionary’s base form is provided (e.g. Molk je zlato.). The English translations of 
the generated proverb material are provided in parentheses based on the form of the 
Slovene proverbs. If an English equivalent exists that matches the Slovene proverb in 
both meaning and structural components, it is presented in parentheses in regular font. 
If no such equivalent exists in English, a literal translation (lit.) is provided. In cases 
where two forms of a proverb are listed within a single entry, the corresponding English 
translations follow in the same order.
2  The Pregovori corpus, currently in version 1.1, is the largest Slovene collection of paremiologi-
cal expressions, based on over 2,500 sources from 1578 to 2010—including books, journals, ca-
lendars, fieldwork notes, and manuscripts. It contains more than 37,000 annotated paremiological 
expressions.
131
Table 1: A combined overview of the ChatGPT’s answers to the four questions above, 
presented as four separate lists of proverbs
A B C D
5 June 2024 5 June 2024 17 June 2024 17 June 2024
“20 most common 
and widespread 
Slovene proverbs”
“20 most well-
known Slovene 
proverbs”
“20 Slovene 
proverbs most 
common in written 
texts”
“20 most 
popular Slovene 
proverbs”
1 A1
Bolje vrabec v 
roki kot golob na 
strehi. 
(lit.: Better a 
sparrow in the hand 
than a pigeon on 
the roof.)
99% – 300
A1 A4 A3
2 A2 
Kdor visoko leta, 
nizko pade. 
(Who flies high, 
falls deep.)
98.4% – 280
A2 C2
Po jutru se dan 
pozna. 
(lit.: The day is 
known by the 
morning.) – 800 
C2 
3 A3
Kjer je dim, tam 
je tudi ogenj.  
(Kjer je dim, je 
tudi ogenj.) 
(Where there’s 
smoke, there’s fire.)
93.9% – 500
A3 A1 A1
4 A4
Kuj železo, dokler 
je vroče. (Strike 
the iron, while it is 
hot.) 98.5% – 250
A4 A2 A2
132
5 A5
Kdo se smeje 
zadnji, se najslajše 
smeje. 
(Kdor se zadnji 
smeje, se najslajše 
smeje.)
(lit.: He who laughs 
last, laughs the 
sweetest.)
99% – 180
A5 C5 
Kar lahko storiš 
danes, ne odlašaj 
na jutri. 
(Don’t put off until 
tomorrow what you 
can do today.)
99% – 90
C5
6 A6
Pes, ki laja, ne 
grize. 
(Dog that barks 
does not bite.)
99.1% – 240
A6 C6 
Tiha voda bregove 
dere.
(lit.: Silent water 
wears away the 
banks.)
98% – 150
C6
7 A7 
Pomoč prijatelja 
v nesreči je več 
vredna kot tisoč 
prijateljev v sreči. 
(lit.: The help of a 
friend in misfortune 
is worth more than 
a thousand friends 
in happiness.)
(Prijatelja spoznaš 
v nesreči.) 
(lit.: You recognize 
a friend in 
misfortune.)
99% – 150
A9 C7 
Vsak je svoje 
sreče kovač. 
(lit.: Everyone is 
the blacksmith of 
their own fortune.) 
98.4% – 300
C7
133
8 A8
Ne stavi na eno 
karto.
(lit.: Don’t bet on 
one card.)
A10 C8 
Ne glej 
darovanemu konju 
v zobe. 
(lit.: Don’t look 
into a gifted horse’s 
teeth.)
(Podarjenemu 
konju se ne gleda 
v zobe.) 
(lit.: One shouldn’t 
look into a gift 
horse’s teeth.)
97.8% – 240
C8
9 A9
Bolje pozno kot 
nikoli. 
(Better late than 
never.)
98.7% – 630
A13 C9 
Iz malega raste 
veliko.
(lit.: From small 
grows great.)
97.2% – 460
C10
10 A10
Govorjenje je 
srebro, molčanje je 
zlato.
(Speech is silver, 
silence is gold.)
(Molk je zlato.)
(Silence is golden.)
80.7% – 270
A18 C10 
Počasi se daleč 
pride.
(lit.: One gets far 
by moving slowly.)
99.4% – 740
C16
11 A11
Zarečenega kruha 
se največ poje. 
(lit.: The bread you 
swear off is the one 
you end up eating 
most.)
99.3% – 650
A19 C11
Eno jabolko 
na dan odžene 
zdravnika stran.
(lit. An apple a day 
drives the doctor 
away.) – 170
C19
134
12 A12
Brez muje se še 
čevelj ne obuje. 
(lit.: Even a shoe 
can’t be put on 
without effort.)
98.7% – 370
A20 C12
Vsak začetek je 
težak. 
(Every beginning is 
difficult.)
97.4% – 940
C15
13 A13
Osel gre samo 
enkrat na led. 
(lit.: A donkey goes 
onto the ice only 
once.)
98.1% – 250
A11 C13 
Kdor se zadnji 
smeje, se najslajše 
smeje. 
A5 
Kdo se smeje 
zadnji, se najslajše 
smeje.
(lit.: He who laughs 
last, laughs the 
sweetest.)
99% – 180
C12
14 A14
Priložnost dela 
tatu. 
(Opportunity 
makes the thief.)
91.7% – 390
A14 C14
Pometi pred svojim 
pragom. 
(lit.: Sweep in 
front of your own 
doorstep.)
(Najprej pometi 
pred svojim 
pragom.)
(lit.: Sweep in 
front of your own 
doorstep first.)
98.2% – 360
C11
135
15 A15
Sreča je opoteča. 
(Luck is fickle.)
97.8% – 200
A12 C15
Kakor boš postlal, 
tako boš ležal. 
(Kakor si boš 
postlal, tako boš 
spal.)
(You’ve made your 
bed, now lie in it.)
98.4% – 110
C13 
Kdor se zadnji 
smeje, se 
najslajše smeje. 
A5 
Kdo se smeje 
zadnji, se 
najslajše smeje.
(lit.: He who 
laughs last, 
laughs the 
sweetest.)
99% – 180
16 A16
Dva petelina se 
na enem kupu ne 
zmenita.
(lit.: Two roosters 
don’t get along 
on the same dung 
heap.)
A15 C16 
Kdor drugemu 
jamo koplje, sam 
vanjo pade.
(He who digs a pit 
for others falls in 
himself.)
99% – 370
D16
Delo iz človeka 
naredi človeka.*
(lit.: Work makes 
a man out of a 
man.)
17 A17
Daleč od oči, daleč 
od srca.
(Far from the eyes, 
far from the heart.)
93% – 400
A17 C17 
Ne boj se 
zgodnjega 
vstajanja, boj se 
poznega ležanja.* 
(lit.: Don’t be afraid 
of early rising, be 
afraid of going to 
bed late.)
D17
Kdor trdo dela, 
se daleč pride.*
(lit.: He who 
works hard, gets 
far.)
18 A18
Rana ura, zlata 
ura. 
(lit.: Early hour, 
golden hour.)
99% – 430
A16 C18
Kdor prej pride, 
prej melje. 
(lit. He who arrives 
first, grinds first.) 
98.4% – 1000
C9
136
19 A19
Jabolko ne pade 
daleč od drevesa.  
(The apple doesn’t 
fall far from the 
tree.) 
1
99.1% – 640
B19
Nič novega pod 
soncem. 
(Nothing new 
under the sun.) 
– 150 
C19
Laž ima kratke 
noge.  
(lit.: A lie has short 
legs.) 
98.4% – 540
C17
20 A20
Pametni popustijo. 
(Pametnejši 
odneha.) 
(The wiser gives 
in.) 
98.4% – 230
B20
Vsak je svoje 
sreče kovač.
(lit.: Everyone is 
the blacksmith of 
their own fortune.)
98.4% – 600
C20
Kar seješ, to boš 
žel.  
(Kar seješ, to 
žanješ.)  
(As you sow, so 
shall you reap.) 
93.3% – 450
C18
2.2  The Accuracy of AI in Correctly Identifying the Genre of an Expression
Our first concern is the accuracy of the AI-generated responses in terms of whether the 
listed expressions conform to the definitional characteristics of a proverb. Specifically, 
we are interested in determining whether the provided examples can indeed be classified 
as proverbs, or whether they may in fact represent other types of phraseological or 
paremiological expressions—or even non-phraseological and non-paremiological 
expressions.
From a structural perspective, paremiological expressions differ from other types 
of phraseologisms in that they constitute a complete text (a texteme), rather than 
merely a fragment of one (Permjakov 1970: 19). Mlacek (1983: 131, 138) empha-
sized that a proverb conveys a complete thought containing a generally valid logical 
judgment. Although such judgments do not constitute a larger coherent system of 
logic, they nevertheless function as self-contained units (Mieder 2004: 1). Proverbs 
serve the function of “modeling reality”; they express regularities, provide guidance, 
and offer moral instruction (Permjakov 1970: 9; Mlacek 1983: 131; Mieder 2004: 3; 
Kržišnik 2008: 38).
The majority of the listed expressions are proverbs, as they conform to the defini-
tional characteristics outlined above. An exception is the verbal idiom staviti vse na eno 
karto (lit.: to bet everything on one card, ‘to risk everything by relying on a single op-
tion’), which appears in the imperative form Ne stavi na eno karto (lit.: Don’t bet on one 
card) in the response to Question A. The misclassification of phrasemes—such as citing 
one form of a multi-word expression instead of a proverb—is a relatively common er-
ror even in responses from native speakers (Meterc 2017: 192). None of the proverbs 
in the lists appear to be invented; however, one hallucinated expression appears in the 
137
response to the third question, and three hallucinations are present in the response to 
the fourth question—these are addressed further below. Each list contains between 17 
and 20 actual proverbs, indicating an accuracy rate of 85 to 100 percent with respect to 
genre classification.
2.3 The Effectiveness of AI in Identifying the Most Representative 
Paremiological Expressions (Core of Actively Used Paremiology in a Given 
Language)
All proverbs from the lists that are included in the paremiological minimum are known 
to more than 90% of speakers, with the exception of Molk je zlato (Silence is golden), 
which is known to 80%; the remaining proverbs rank among the 200 most well-known 
proverbs in Slovene. The best-known proverb is Počasi se daleč pride (lit. One gets 
far by moving slowly), recognized by 99.4% of respondents. In addition to inclusion 
in the paremiological minimum, the presence of a proverb within the broader body 
of contemporary Slovene paremiology also serves as a criterion for evaluating the 
accuracy of the AI-generated responses. The generated lists contain three proverbs that 
are neither part of the minimum nor included in the SSKJ—Nič novega pod soncem 
(Nothing new under the sun), Po jutru se dan pozna (lit. The day is known by the 
morning), and Eno jabolko na dan odžene zdravnika stran (lit. An apple a day drives the 
doctor away). These three proverbs are relatively common and may also be considered 
part of the core of Slovene paremiology, although no data on their familiarity among 
speakers is currently available. The proverb Po jutru se dan pozna is frequent in the 
metaFida 1.0 corpus (approx. 800 occurrences) and also stood out in terms of frequency 
among additional responses provided by survey participants in the study on the 
paremiological minimum (Meterc 2017: 192). All three of the aforementioned proverbs 
are already included in the SPP dictionary. One expression from the AI-generated list 
is less common in contemporary language: the form Dva petelina se na enem kupu ne 
zmenita could not be confirmed in modern usage, whereas the variant Ne moreta biti 
dva petelina na enem kupu (lit.: There can’t be two roosters on the same heap) does 
appear in contemporary texts. The following forms are found only in the Pregovori 
corpus: Ne moreta biti dva petelina na istem gnoju (lit.: There can’t be two roosters on 
the same dung heap), Kjer sta dva petelina na enem dvorišču, je vrišč (lit.: Where there 
are two roosters in one yard, there’s always a racket), Dva petelina na dvorišč je vrišč 
(lit.: Two roosters in one yard – a racket), and Ni dobro, če sta dva petelina v kurniku 
(lit.: It’s not good if there are two roosters in the henhouse).
According to the Slovene paremiological minimum, which was established through 
an online survey of 527 speakers (Meterc 2023: 122–124), between 15 and 18 expressions 
from each list are included, indicating an accuracy rate of 75 to 95 percent in relation to 
the minimum. If we also take into account the three proverbs mentioned above—although 
not covered by the minimum survey, they nonetheless belong to the core of Slovene 
paremiology based on their frequency in the metaFida 1.0 corpus—then between 17 
and 19 expressions per list can be considered part of the core of contemporary Slovene 
paremiology, resulting in an accuracy rate of 85 to 95 percent.
138
In the contemporary Slovene corpus metaFida 1.0—as well as in older sources 
included in the Pregovori corpus—there is no evidence (not even in variant forms) of 
four expressions listed by the AI, such as Ne boj se zgodnjega vstajanja, boj se poznega 
ležanja (lit.: Don’t be afraid of early rising, be afraid of going to bed late). Two of these 
expressions are clear hallucinations: Delo iz človeka naredi človeka (lit.: Work makes a 
man out of a man) and Kdor trdo dela, se daleč pride (lit.: He who works hard, gets far). 
In both cases, the AI has generated expressions by combining elements from actual 
Slovene proverbs that do exist and are part of the paremiological minimum: Obleka 
naredi človeka (Clothes make the man, ‘expresses that a person’s appearance influences 
others’ perception’) and Počasi se daleč pride (Slowly, one gets far, ‘expresses that 
steady, patient effort leads to long-term success’). The form Kdor trdo dela, se daleč 
pride (lit.: He who works hard, gets far) is grammatically incorrect, as the reflexive 
possessive particle se belongs to the original proverb and does not match the pronoun 
kdor in the newly generated expression.
2.4 The Accuracy of AI in Providing Structurally Appropriate Proverbs 
In the following section, we focus on the aspect of the conventionalized form of 
paremiological expressions. We compare the forms provided by the AI with those 
that are either corpus-confirmed variants or the most frequent, representative, and 
lexicographically standardized base forms.
The majority of the listed forms (14 to 16 per list, or approximately 70 to 80%) 
are frequently used forms that are also cited in the SPP dictionary, based on corpus-
based research on proverb variation (Meterc 2019). The accuracy of the responses is 
even higher if we consider that nearly all of these forms also correspond to the basic 
lexicographic forms (SPP), which are the most frequent in actual usage. An exception 
is, for instance, the variant Kjer je dim, tam je tudi ogenj (Where there is smoke, there 
is also fire, ‘expresses that it is reasonable to look for, acknowledge, or assume a cause 
behind something’), which is indeed attested in use and differs from the dictionary 
base form Kjer je dim, je tudi ogenj only by the addition of the adverb tam (there). The 
SPP dictionary lists as many as 15 variants of this proverb. Two forms of the same 
proverb appear on two different dates: the base form Kdor se zadnji smeje, se najslajše 
smeje (He who laughs last, laughs the sweetest, ‘expresses that final success is the 
most satisfying, especially after early triumphs by others’) and the variant Kdo se smeje 
zadnji, se najslajše smeje, which is not attested in actual use.
Among the AI-generated responses, there are also forms that deviate significantly 
from the base form, or that are entirely absent from actual usage. One such example 
is Ne glej darovanemu konju v zobe (lit.: Don’t look into a gifted horse’s teeth). In 
the Pregovori corpus, two similar forms are attested: Darovanemu konju ne glej na 
zobe and Darovanemu konju ne glej v zobe (Don’t look at a gift horse’s teeth vs. Don’t 
look into a gift horse’s teeth). Another example is Govorjenje je srebro, molčanje je 
zlato (Speech is silver, silence is golden). The closest attested variant in contemporary 
Slovene, as listed in the SPP dictionary, is Govorjenje je srebro, molk je zlato ‘expresses 
that speaking can be valuable, but staying silent is often wiser or more virtuous’. In both 
139
cases, it is highly likely that the English proverb influenced the AI’s response.
An interesting example is the form Pomoč prijatelja v nesreči je več vredna kot 
tisoč prijateljev v sreči (lit.: The help of a friend in misfortune is worth more than a 
thousand friends in happiness). In Slovene, the conventional form of this proverb is 
Prijatelja spoznaš v nesreči (lit.: You recognize a friend in misfortune, ‘expresses that 
true friendship is revealed in times of trouble or hardship’). It is highly likely that the 
version generated by GPT-4o is a partial hallucination that reformulates the proverb 
using a construction pattern common in both English and Slovene: One X is worth a 
thousand Y (e.g., A picture is worth a thousand words). Such hallucinations—and those 
for which no clear connection to a known Slovene proverb can be found (e.g., Ne boj 
se zgodnjega vstajanja, boj se poznega ležanja (lit.: Don’t be afraid of early rising; be 
afraid of going to bed late)—represent a distinct area of research potential.
2.5 The Potential for Developing a Paremiological Minimum (and Optimum) 
Based on AI-Generated Material
The theoretical and applied benefits of establishing a paremiological minimum—a list 
of the most widely known proverbs in a given language (Permjakov 1989)—and a 
paremiological optimum—a list of proverbs that are both widely known and frequently 
used (Ďurčo 2014)—have been extensively described (Meterc 2017: 40–45). One of the 
key issues in this context is the selection of paremiological material to be tested through 
surveys and corpus analysis. Since it is not feasible to test thousands of expressions, a 
well-designed preliminary selection of proverbs is essential.
A major challenge in obtaining paremiological data through artificial intelligence 
lies in the opacity of how the tool operates. It is worth noting that in commercial 
LLMs—such as all previous and current OpenAI models—the processes by which 
these models arrange and generate answers are not open or inspectable. The weights, 
training data, and fine-tuning details of GPT models are not publicly available, which 
means that the underlying sources remain obscured. Already from the results presented 
in the article (e.g., the inconsistency of forms such as Ne glej darovanemu konju v zobe 
with contemporary usage), it is evident that the model used draws on certain collections 
that include a large amount of outdated paremiological material and are published on 
websites, but does not draw, for instance, on language corpora or lexicographic sources. 
This can be verified by asking it for 20 proverbs from the Dictionary of Proverbs and 
Similar Paremiological Expressions (SPP; Meterc 2020–), as it also lists expressions 
that are not included in the dictionary (e.g., the hallucinated expression Voda na svoj 
mlin teče — lit. The water runs to its own mill). Moreover, when asked about proverbs 
not yet included in this dictionary, it lists some that actually are in it (e.g., Kjer je volja, 
je pot — lit. Where there is a will, there is a way). It also states that it does not have 
direct access to the language corpus (e.g., metaFida1.0).
Another problematic aspect is the non-reproducibility of the research: questions such 
as those presented in this article often yield slightly different responses each time. This 
also renders the ranking of results for similar prompts within paremiological research 
unreliable. What can be regarded as reliable, however, is the overlap of proverbs 
140
that can serve as a basis for subsequent human analysis and for the construction of 
a paremiological minimum. The partial non-reproducibility can be turned to our 
advantage if the goal is to collect as much relevant paremiological material as possible 
or to broaden the results of a targeted selection requested from a LLM. Thus, the 
greatest contribution of such models to research may lie in the very early stages of 
minimum construction.
We can conclude that it was necessary to verify whether differences in wording 
regarding the familiarity, frequency, or popularity of proverbs affect the results. This 
influence was not confirmed, as the model responds to the question about familiarity 
as if it were about frequency. For two very similar questions — (A) Please generate a 
list of the 20 most common and widespread Slovene proverbs, and (C) Please generate 
a list of the 20 Slovene proverbs that are most common in written texts — which were 
submitted on two different dates (5 and 17 June 2024), the model provides two distinctly 
different lists of proverbs. However, the most overlapping lists are those generated in 
response to questions submitted on the same date (A and B on one hand, and C and D 
on the other). 
Across the four responses collected on two different dates, there are 33 distinct 
proverbs that fall within the paremiological minimum. When comparing the first list 
(A) from June 5, 2024, and the first list (C) from June 17, we find that out of 32 different 
proverbs that belong to the paremiological minimum, four appear in both lists. Rarely 
used proverb forms and hallucinations tend to recur on the same date, but not across 
different dates. We may therefore expect that by posing additional questions on multiple 
dates, it would be possible to obtain an overlapping set of relevant proverbs, which 
could subsequently be used in a survey to determine the paremiological minimum 
and in a corpus-based frequency analysis aimed at transforming the paremiological 
minimum into a paremiological optimum, following the methodology proposed by 
Peter Ďurčo (2014: 189–201). 
3  CONCLUSIONS
For the average internet user, artificial intelligence offers a simple and fast alternative for 
obtaining information about representative, frequent, or well-known proverbs in a given 
language. It serves as an alternative to (1) dictionary sources or (2) the heterogeneous 
and often chaotic array of online materials, such as digitized older collections, forum 
posts and non-specialist articles focusing on typical proverbs in a particular language. 
AI is more user-friendly, requiring less prior knowledge and enabling quicker access 
to relatively reliable information. This is particularly important for non-expert users, 
who are only able to assess the reliability and representativeness of sources to a limited 
extent. The issues mentioned above regarding the non-reproducibility of results may, 
at present, be best addressed through simple methodological adaptations. These steps 
can improve accuracy, or at least minimize the number of factors that might affect the 
outcomes of a given prompt:
141
1. the prompt should be used in a fixed and isolated form, free of additional 
context and within a new conversation (chat);
2. a deterministic form of the response should be specified, for example by 
including an instruction such as “Do not vary your selection or wording in 
future responses”;
3. the results generated from such prompts should be saved and treated as fixed 
datasets (as shown in Table 1).
Based on Slovene empirical data on proverb familiarity and corpus data on 
frequency, our research has demonstrated that the responses provided by ChatGPT-4o 
are, to a large extent, successful:
− 85–100% accuracy in terms of genre-specific features of proverbs;
− 85–95% accuracy in terms of the representativeness of the proverbs with 
respect to the core of Slovene paremiology;
− 70–80% accuracy in terms of the structural form of the proverbs.
For the specialized user—such as a phraseologist, paremiologist, or 
paremiographer—the AI-generated responses represent a useful body of 
paremiological material that must be verified using additional empirical data (e.g., 
frequency, familiarity, and the conventionalization of form). With the help of the 
selected LLM model, it is possible to compile high-quality lists of the most relevant 
paremiological expressions. By repeating queries and combining results—followed 
by selection based on corpus analysis—it may be possible to obtain larger sets of 
representative expressions. This is particularly valuable from the perspective of 
phraseographic and paremiographic work, and potentially also more broadly—for 
example, in phraseodidactics and paremiodidactics.
Through previous empirical analyses of Slovene paremiology in terms of familiarity 
and frequency, we have confirmed the relevance of the AI-generated lists, which could 
be used as a basis for developing paremiological minimums and optimums in languages 
where such benchmarks have not yet been established. Artificial intelligence could 
serve as a useful tool for the preliminary selection of paremiological expressions before 
they are presented to large groups of survey participants and analyzed within linguistic 
corpora.
The hallucinated forms we obtained from the LLM have strong research potential. 
As they, to varying degrees and in different ways, resemble proverbs that exist or once 
existed in the given language, it would be interesting for future research to analyse how 
they are formed, their recurring structural patterns, and the extent to which it can be 
determined which existing proverbs they derive from. This could serve as a valuable 
comparative addition to studies on how genuine proverbs are formed. 
142
Sources
BABIČ, Saša; et al. (2023) Collection of Slovene paremiological units Pregovori 
1.1. Slovene language resource repository CLARIN.SI. 31 March 2025. http://hdl.
handle.net/11356/1853
ERJAVEC, Tomaž (2023) Corpus of combined Slovene corpora metaFida 1.0. Slovene 
language resource repository CLARIN.SI. 31 March 2025. http://hdl.handle.
net/11356/1775
METERC, Matej (2020–) Slovar pregovorov in sorodnih paremioloških izrazov. htt-
ps://www.fran.si <31 March 2025>
OpenAI. 2024. ChatGPT-4o model. https://openai.com/chatgpt <31 March 2025>
eSSKJ: Slovar slovenskega knjižnega jezika. https://www.fran.si <31 March 2025>
Slovar slovenskega knjižnega jezika. https://www.fran.si <31 March 2025>
Bibliography
DE SCHRYVER, Gilles-Maurice (2023) Generative AI and Lexicography: The Cur-
rent State of the Art Using ChatGPT. International Journal of Lexicography 36/4, 
355–387. 
ĎURČO, Peter (2014) “Empirical Research and Paremiological Minimum.” In: H. 
Hrisztova-Gotthardt/M. A. Varga (eds), Introduction to Paremiology: A Compre-
hensive Guide to Proverb Studies. Warsaw: Versita, 183–205. 
GRZYBEK, Peter/Christoph CHLOSTA (2008) “Some Essentials on the Popularity 
of (American) Proverbs.” In: K. J. McKenna (ed.), Festschrift on the Occasion of 
Wolfgang Mieder’s 65th Birthday. Burlington: University of Vermont, 95–110. 
JAKUBÍČEK, Miloš/Michael RUNDELL (2023) “The End of Lexicography? Can 
ChatGPT Outperform Current Tools for Post-Editing Lexicography?” In: M. 
Medveď/M. Měchura/C. Tiberius/I. Kosem/J. Kallas/M. Jakubíček/S. Krek (eds), 
Electronic Lexicography in the 21st Century (eLex 2023). Proceedings of the eLex 
2023 Conference, 518–533. 
KRŽIŠNIK, Erika (2008) “Viri za kulturološko interpretacijo frazeoloških enot.” Jezik 
in slovstvo 53/1, 33–47. 
METERC, Matej (2017) Paremiološki optimum: najbolj poznani in pogosti pregovori 
ter sorodne paremije v slovenščini. Ljubljana: Založba ZRC, ZRC SAZU. 
METERC, Matej (2019) “Analiza frazeološke variantnosti za slovarski prikaz v eSSKJ-
ju in SPP-ju.” Jezikoslovni zapiski 25/2, 33–45. 
METERC, Matej (2023) “Izbiranje iztočnic za Slovar pregovorov in sorodnih 
paremioloških izrazov: želje, merila in empirični podatki.” In: M. Jesenšek (ed.), 
Pleteršnikovi dnevi: ob stoletnici smrti Maksa Pleteršnika: program simpozija in 
povzetki referatov: Pišece, Ljubljana, Cankova, 12.–14. september 2023. Ljubljana: 
Slovenska akademija znanosti in umetnosti, 29–30. 
METERC, Matej/Nataša JAKOP (2016) “Lexikografické spracovanie frazeologických 
variantov v novom slovníku slovinského spisovného jazyka.” In: M. Lišková (ed.), 
Akademický slovník současné češtiny a software pro jeho tvorbu aneb Slovníky a je-
jich uživatelé v 21. století: sborník abstraktů z workshopu, Praha, 29.–30. listopadu 
143
2016. Praha: Ústav pro jazyk český AV ČR, 55–56. 
MIEDER, Wolfgang (2004) Proverbs: A Handbook. Westport: Greenwood Press. 
MLACEK, Jozef (1983) “Problémy komplexného rozboru prísloví a porekadiel.” 
Slovenská reč 48/2, 129–140. 
PERMJAKOV, Grigorij Lvovič (1970): Osnovy strukturnoj paremiologii: Zametki po 
obščej teorii kliše. Moskva: Nauka.
PERMJAKOV, Grigorij Lvovič (1989) “On the Question of a Russian Paremiological 
Minimum.” Proverbium 6, 91–102. 
Abstract
THE BEST KNOWN AND FREQUENTLY USED SLOVENE PROVERBS AC-
CORDING TO CHATGPT-4O: EXPLORING THE POTENTIAL FOR AN  
AI-BASED PAREMIOLOGICAL MINIMUM
The article examines the type of data yielded by the publicly accessible AI model GPT-
4o concerning the core of Slovene paremiology—namely, the most widely known and/
or most frequently used proverbs. The proverbs identified by the model are compared 
with data obtained from corpus-based analyses, survey research, and established 
Slovene paremiographical sources. Drawing on these comparisons, this study outlines 
the current strengths and limitations of using this AI model in paremiological research 
and evaluates its potential impact on contemporary proverb studies.
Keywords: Slovene paremiology, large language model, GPT-4o, paremiological 
minimum, paremiological optimum
Povzetek
NAJBOLJ POZNANI IN POGOSTO UPORABLJENI SLOVENSKI PREGOVORI 
PO IZBORU CHATGPT-4O: RAZISKOVANJE POTENCIALA UMETNE 
INTELIGENCE ZA VZPOSTAVITEV PAREMIOLOŠKEGA MINIMUMA
Članek razkriva, kakšne podatke nam o jedru slovenske paremiologije – najbolj po-
znanih in/ali pogostih pregovorih – ponujajo odgovori prosto dostopnega modela UI, 
znanega kot GPT-4o. Nabori navedenih izrazov so primerjani s podatki, ki so nam o 
poznanosti in pogostnosti na voljo iz korpusnih in anketnih raziskav ter iz slovenskih 
paremiografskih virov. Na podlagi pridobljenih podatkov so izpostavljene trenutne 
prednosti in slabosti tovrstne uporabe imenovanega modela in njegovega vpliva na 
sodobno paremiološko raziskovanje.
Ključne besede: slovenska paremiologija, veliki jezikovni model, GPT-4o, paremiolo-
ški minimum, paremiološki optimum